parallel programming paradigm: Topics by Science.gov

Sample records for parallel programming paradigm

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

2003-01-01

Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Mechanism to support generic collective communication across a variety of programming models

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Dozsa, Gabor [Ardsley, NY; Kumar, Sameer [White Plains, NY

2011-07-19

A system and method for supporting collective communications on a plurality of processors that use different parallel programming paradigms, in one aspect, may comprise a schedule defining one or more tasks in a collective operation, an executor that executes the task, a multisend module to perform one or more data transfer functions associated with the tasks, and a connection manager that controls one or more connections and identifies an available connection. The multisend module uses the available connection in performing the one or more data transfer functions. A plurality of processors that use different parallel programming paradigms can use a common implementation of the schedule module, the executor module, the connection manager and the multisend module via a language adaptor specific to a parallel programming paradigm implemented on a processor.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Parallel Programming Paradigms

DTIC Science & Technology

1987-07-01

Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC

NASA Astrophysics Data System (ADS)

Alruwaili, Manal

With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2016-01-01

In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Comparison of two paradigms for distributed shared memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.

1990-08-01

The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
Enhancing instruction scheduling with a block-structured ISA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Melvin, S.; Patt, Y.

It is now generally recognized that not enough parallelism exists within the small basic blocks of most general purpose programs to satisfy high performance processors. Thus, a wide variety of techniques have been developed to exploit instruction level parallelism across basic block boundaries. In this paper we discuss some previous techniques along with their hardware and software requirements. Then we propose a new paradigm for an instruction set architecture (ISA): block-structuring. This new paradigm is presented, its hardware and software requirements are discussed and the results from a simulation study are presented. We show that a block-structured ISA utilizes bothmore » dynamic and compile-time mechanisms for exploiting instruction level parallelism and has significant performance advantages over a conventional ISA.« less
Simulation/Emulation Techniques: Compressing Schedules With Parallel (HW/SW) Development

NASA Technical Reports Server (NTRS)

Mangieri, Mark L.; Hoang, June

2014-01-01

NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of simulators and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of simulators and emulators in advance of deliverable hardware to achieve parallel design and development on a compressed schedule.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2017-01-01

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

PubMed

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

PubMed Central

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
On Parallel Software Engineering Education Using Python

ERIC Educational Resources Information Center

Marowka, Ami

2018-01-01

Python is gaining popularity in academia as the preferred language to teach novices serial programming. The syntax of Python is clean, easy, and simple to understand. At the same time, it is a high-level programming language that supports multi programming paradigms such as imperative, functional, and object-oriented. Therefore, by default, it is…
VISUAL AND AUDIO PRESENTATION IN MACHINE PROGRAMED INSTRUCTION. FINAL REPORT.

ERIC Educational Resources Information Center

ALLEN, WILLIAM H.

THIS STUDY WAS PART OF A LARGER RESEARCH PROGRAM AIMED TOWARD DEVELOPMENT OF PARADIGMS OF MESSAGE DESIGN. OBJECTIVES OF THREE PARALLEL EXPERIMENTS WERE TO EVALUATE INTERACTIONS OF PRESENTATION MODE, PROGRAM TYPE, AND CONTENT AS THEY AFFECT LEARNER CHARACTERISTICS. EACH EXPERIMENT USED 18 TREATMENTS IN A FACTORIAL DESIGN WITH RANDOMLY SELECTED…
Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.

PubMed

Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang

2018-01-01

The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.
A Comparison of PETSC Library and HPF Implementations of an Archetypal PDE Computation

NASA Technical Reports Server (NTRS)

Hayder, M. Ehtesham; Keyes, David E.; Mehrotra, Piyush

1997-01-01

Two paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation a nonlinear, structured-grid partial differential equation boundary value problem using the same algorithm on the same hardware. Both paradigms, parallel libraries represented by Argonne's PETSC, and parallel languages represented by the Portland Group's HPF, are found to be easy to use for this problem class, and both are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under either paradigm includes specification of the data partitioning (corresponding to a geometrically simple decomposition of the domain of the PDE). Programming in SPAM style for the PETSC library requires writing the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global- to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm, introducing concurrency through subdomain blocking (an effort similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Correctness and scalability are cross-validated on up to 32 nodes of an IBM SP2.
Parallel Large-scale Semidefinite Programming for Strong Electron Correlation: Using Correlation and Entanglement in the Design of Efficient Energy-Transfer Mechanisms

DTIC Science & Technology

2014-09-24

which nature uses strong electron correlation for efficient energy transfer, particularly in photosynthesis and bioluminescence, (ii) providing an...strong electron correlation for efficient energy transfer, particularly in photosynthesis and bioluminescence, (ii) providing an innovative paradigm...efficient energy transfer, particularly in photosynthesis and bioluminescence, (ii) providing an innovative paradigm for energy transfer in photovoltaic
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
The parallel programming of voluntary and reflexive saccades.

PubMed

Walker, Robin; McSorley, Eugene

2006-06-01

A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

Orion FSW V and V and Kedalion Engineering Lab Insight

NASA Technical Reports Server (NTRS)

Mangieri, Mark L.

2010-01-01

NASA, along with its prime Orion contractor and its subcontractor s are adapting an avionics system paradigm borrowed from the manned commercial aircraft industry for use in manned space flight systems. Integrated Modular Avionics (IMA) techniques have been proven as a robust avionics solution for manned commercial aircraft (B737/777/787, MD 10/90). This presentation will outline current approaches to adapt IMA, along with its heritage FSW V&V paradigms, into NASA's manned space flight program for Orion. NASA's Kedalion engineering analysis lab is on the forefront of validating many of these contemporary IMA based techniques. Kedalion has already validated many of the proposed Orion FSW V&V paradigms using Orion's precursory Flight Test Article (FTA) Pad Abort 1 (PA-1) program. The Kedalion lab will evolve its architectures, tools, and techniques in parallel with the evolving Orion program.
Execution environment for intelligent real-time control systems

NASA Technical Reports Server (NTRS)

Sztipanovits, Janos

1987-01-01

Modern telerobot control technology requires the integration of symbolic and non-symbolic programming techniques, different models of parallel computations, and various programming paradigms. The Multigraph Architecture, which has been developed for the implementation of intelligent real-time control systems is described. The layered architecture includes specific computational models, integrated execution environment and various high-level tools. A special feature of the architecture is the tight coupling between the symbolic and non-symbolic computations. It supports not only a data interface, but also the integration of the control structures in a parallel computing environment.
Programming model for distributed intelligent systems

NASA Technical Reports Server (NTRS)

Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.

1988-01-01

A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.
Real-Time MENTAT programming language and architecture

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.

1989-01-01

Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

NASA Technical Reports Server (NTRS)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
Development for SSV on a parallel processing system (PARAGON)

NASA Astrophysics Data System (ADS)

Gothard, Benny M.; Allmen, Mark; Carroll, Michael J.; Rich, Dan

1995-12-01

A goal of the surrogate semi-autonomous vehicle (SSV) program is to have multiple vehicles navigate autonomously and cooperatively with other vehicles. This paper describes the process and tools used in porting UGV/SSV (unmanned ground vehicle) autonomous mobility and target recognition algorithms from a SISD (single instruction single data) processor architecture (i.e., a Sun SPARC workstation running C/UNIX) to a MIMD (multiple instruction multiple data) parallel processor architecture (i.e., PARAGON-a parallel set of i860 processors running C/UNIX). It discusses the gains in performance and the pitfalls of such a venture. It also examines the merits of this processor architecture (based on this conceptual prototyping effort) and programming paradigm to meet the final SSV demonstration requirements.
GPU accelerated particle visualization with Splotch

NASA Astrophysics Data System (ADS)

Rivi, M.; Gheller, C.; Dykes, T.; Krokos, M.; Dolag, K.

2014-07-01

Splotch is a rendering algorithm for exploration and visual discovery in particle-based datasets coming from astronomical observations or numerical simulations. The strengths of the approach are production of high quality imagery and support for very large-scale datasets through an effective mix of the OpenMP and MPI parallel programming paradigms. This article reports our experiences in re-designing Splotch for exploiting emerging HPC architectures nowadays increasingly populated with GPUs. A performance model is introduced to guide our re-factoring of Splotch. A number of parallelization issues are discussed, in particular relating to race conditions and workload balancing, towards achieving optimal performances. Our implementation was accomplished by using the CUDA programming paradigm. Our strategy is founded on novel schemes achieving optimized data organization and classification of particles. We deploy a reference cosmological simulation to present performance results on acceleration gains and scalability. We finally outline our vision for future work developments including possibilities for further optimizations and exploitation of hybrid systems and emerging accelerators.
Paramedir: A Tool for Programmable Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

Performance analysis of parallel scientific applications is time consuming and requires great expertise in areas such as programming paradigms, system software, and computer hardware architectures. In this paper we describe a tool that facilitates the programmability of performance metric calculations thereby allowing the automation of the analysis and reducing the application development time. We demonstrate how the system can be used to capture knowledge and intuition acquired by advanced parallel programmers in order to be transferred to novice users.
Parallel computation and the basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1993-05-01

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less

A Linked-Cell Domain Decomposition Method for Molecular Dynamics Simulation on a Scalable Multiprocessor

DOE PAGES

Yang, L. H.; Brooks III, E. D.; Belak, J.

1992-01-01

A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.
Performance Evaluation of Parallel Algorithms and Architectures in Concurrent Multiprocessor Systems

DTIC Science & Technology

1988-09-01

HEP and Other Parallel processors, Report no. ANL-83-97, Argonne National Laboratory, Argonne, Ill. 1983. [19] Davidson, G . S. A Practical Paradigm for...IEEE Comp. Soc., 1986. [241 Peir, Jih-kwon, and D. Gajski , "CAMP: A Programming Aide For Multiprocessors," Proc. 1986 ICPP, IEEE Comp. Soc., pp475...482. [251 Pfister, G . F., and V. A. Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks,"IEEE Trans. Comp., C-34, Oct
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
Thread concept for automatic task parallelization in image analysis

NASA Astrophysics Data System (ADS)

Lueckenhaus, Maximilian; Eckstein, Wolfgang

1998-09-01

Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Heterogeneous Hardware Parallelism Review of the IN2P3 2016 Computing School

NASA Astrophysics Data System (ADS)

Lafage, Vincent

2017-11-01

Parallel and hybrid Monte Carlo computation. The Monte Carlo method is the main workhorse for computation of particle physics observables. This paper provides an overview of various HPC technologies that can be used today: multicore (OpenMP, HPX), manycore (OpenCL). The rewrite of a twenty years old Fortran 77 Monte Carlo will illustrate the various programming paradigms in use beyond language implementation. The problem of parallel random number generator will be addressed. We will give a short report of the one week school dedicated to these recent approaches, that took place in École Polytechnique in May 2016.
Parallel computing for probabilistic fatigue analysis

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

1993-01-01

This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Message Passing and Shared Address Space Parallelism on an SMP Cluster

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
Parallel computation and the Basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1992-12-16

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to-use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communication costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis and Parallelmore » Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Parallels between New Paradigms in Science and in Reading and Literary Theories.

ERIC Educational Resources Information Center

Weaver, Constance

Drawing upon research from a number of fields, this paper explores parallels between new paradigms in the sciences--particularly physics, chemistry, and biology--and new paradigms in reading and literary theory--particularly a socio- or psycholinguistic, semiotic, transactional view of reading and a transactional view of the literary experience.…
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

2002-01-01

In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Parallel Processing with Digital Signal Processing Hardware and Software

NASA Technical Reports Server (NTRS)

Swenson, Cory V.

1995-01-01

The assembling and testing of a parallel processing system is described which will allow a user to move a Digital Signal Processing (DSP) application from the design stage to the execution/analysis stage through the use of several software tools and hardware devices. The system will be used to demonstrate the feasibility of the Algorithm To Architecture Mapping Model (ATAMM) dataflow paradigm for static multiprocessor solutions of DSP applications. The individual components comprising the system are described followed by the installation procedure, research topics, and initial program development.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Parallel, Distributed Scripting with Python

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, P J

2002-05-24

Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

NASA Technical Reports Server (NTRS)

Su, Ernesto; Lain, Antonio; Ramaswamy, Shankar; Palermo, Daniel J.; Hodges, Eugene W., IV; Banerjee, Prithviraj

1995-01-01

The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. .A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop bounds and array subscripts, and variable number of processors, provided that arrays were single or multi-dimensionally block distributed. The techniques presented here extend the compiler to also accept multidimensional cyclic and block-cyclic distributions within a uniform symbolic framework. These extensions demand more sophisticated symbolic manipulation capabilities. A novel aspect of our approach is to meet this demand by interfacing PARADIGM with a powerful off-the-shelf symbolic package, Mathematica. This paper describes some of the Mathematica routines that performs various transformations, shows how they are invoked and used by the compiler to overcome the new challenges, and presents experimental results for code involving cyclic and block-cyclic arrays as evidence of the feasibility of the approach.
Cellular automata with object-oriented features for parallel molecular network modeling.

PubMed

Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan

2005-06-01

Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Haoqiang; VanderWijngaart, Rob F.

2003-01-01

We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
The Automated Instrumentation and Monitoring System (AIMS) reference manual

NASA Technical Reports Server (NTRS)

Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

1993-01-01

Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
Improving Running Times for the Determination of Fractional Snow-Covered Area from Landsat TM/ETM+ via Utilization of the CUDA® Programming Paradigm

NASA Astrophysics Data System (ADS)

McGibbney, L. J.; Rittger, K.; Painter, T. H.; Selkowitz, D.; Mattmann, C. A.; Ramirez, P.

2014-12-01

As part of a JPL-USGS collaboration to expand distribution of essential climate variables (ECV) to include on-demand fractional snow cover we describe our experience and implementation of a shift towards the use of NVIDIA's CUDA® parallel computing platform and programming model. In particular the on-demand aspect of this work involves the improvement (via faster processing and a reduction in overall running times) for determination of fractional snow-covered area (fSCA) from Landsat TM/ETM+. Our observations indicate that processing tasks associated with remote sensing including the Snow Covered Area and Grain Size Model (SCAG) when applied to MODIS or LANDSAT TM/ETM+ are computationally intensive processes. We believe the shift to the CUDA programming paradigm represents a significant improvement in the ability to more quickly assert the outcomes of such activities. We use the TMSCAG model as our subject to highlight this argument. We do this by describing how we can ingest a LANDSAT surface reflectance image (typically provided in HDF format), perform spectral mixture analysis to produce land cover fractions including snow, vegetation and rock/soil whilst greatly reducing running time for such tasks. Within the scope of this work we first document the original workflow used to assert fSCA for Landsat TM and it's primary shortcomings. We then introduce the logic and justification behind the switch to the CUDA paradigm for running single as well as batch jobs on the GPU in order to achieve parallel processing. Finally we share lessons learned from the implementation of myriad of existing algorithms to a single set of code in a single target language as well as benefits this ultimately provides scientists at the USGS.
Methodologies and systems for heterogeneous concurrent computing

NASA Technical Reports Server (NTRS)

Sunderam, V. S.

1994-01-01

Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1999-01-01

The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.

Application Characterization at Scale: Lessons learned from developing a distributed Open Community Runtime system for High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Landwehr, Joshua B.; Suetterlein, Joshua D.; Marquez, Andres

2016-05-16

Since 2012, the U.S. Department of Energy’s X-Stack program has been developing solutions including runtime systems, programming models, languages, compilers, and tools for the Exascale system software to address crucial performance and power requirements. Fine grain programming models and runtime systems show a great potential to efficiently utilize the underlying hardware. Thus, they are essential to many X-Stack efforts. An abundant amount of small tasks can better utilize the vast parallelism available on current and future machines. Moreover, finer tasks can recover faster and adapt better, due to a decrease in state and control. Nevertheless, current applications have been writtenmore » to exploit old paradigms (such as Communicating Sequential Processor and Bulk Synchronous Parallel processing). To fully utilize the advantages of these new systems, applications need to be adapted to these new paradigms. As part of the applications’ porting process, in-depth characterization studies, focused on both application characteristics and runtime features, need to take place to fully understand the application performance bottlenecks and how to resolve them. This paper presents a characterization study for a novel high performance runtime system, called the Open Community Runtime, using key HPC kernels as its vehicle. This study has the following contributions: one of the first high performance, fine grain, distributed memory runtime system implementing the OCR standard (version 0.99a); and a characterization study of key HPC kernels in terms of runtime primitives running on both intra and inter node environments. Running on a general purpose cluster, we have found up to 1635x relative speed-up for a parallel tiled Cholesky Kernels on 128 nodes with 16 cores each and a 1864x relative speed-up for a parallel tiled Smith-Waterman kernel on 128 nodes with 30 cores.« less
On the Optimality of Serial and Parallel Processing in the Psychological Refractory Period Paradigm: Effects of the Distribution of Stimulus Onset Asynchronies

ERIC Educational Resources Information Center

Miller, Jeff; Ulrich, Rolf; Rolke, Bettina

2009-01-01

Within the context of the psychological refractory period (PRP) paradigm, we developed a general theoretical framework for deciding when it is more efficient to process two tasks in serial and when it is more efficient to process them in parallel. This analysis suggests that a serial mode is more efficient than a parallel mode under a wide variety…
Rule-based programming paradigm: a formal basis for biological, chemical and physical computation.

PubMed

Krishnamurthy, V; Krishnamurthy, E V

1999-03-01

A rule-based programming paradigm is described as a formal basis for biological, chemical and physical computations. In this paradigm, the computations are interpreted as the outcome arising out of interaction of elements in an object space. The interactions can create new elements (or same elements with modified attributes) or annihilate old elements according to specific rules. Since the interaction rules are inherently parallel, any number of actions can be performed cooperatively or competitively among the subsets of elements, so that the elements evolve toward an equilibrium or unstable or chaotic state. Such an evolution may retain certain invariant properties of the attributes of the elements. The object space resembles Gibbsian ensemble that corresponds to a distribution of points in the space of positions and momenta (called phase space). It permits the introduction of probabilities in rule applications. As each element of the ensemble changes over time, its phase point is carried into a new phase point. The evolution of this probability cloud in phase space corresponds to a distributed probabilistic computation. Thus, this paradigm can handle tor deterministic exact computation when the initial conditions are exactly specified and the trajectory of evolution is deterministic. Also, it can handle probabilistic mode of computation if we want to derive macroscopic or bulk properties of matter. We also explain how to support this rule-based paradigm using relational-database like query processing and transactions.
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
Integrated Task and Data Parallel Programming

NASA Technical Reports Server (NTRS)

Grimshaw, A. S.

1998-01-01

This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Integrated Task And Data Parallel Programming: Language Design

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Extending HPF for advanced data parallel applications

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1994-01-01

The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.
An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

NASA Astrophysics Data System (ADS)

Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

2003-08-01

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy

PubMed Central

Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-01-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined. PMID:19564686
Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

PubMed

Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-07-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.
The neural basis of parallel saccade programming: an fMRI study.

PubMed

Hu, Yanbo; Walker, Robin

2011-11-01

The neural basis of parallel saccade programming was examined in an event-related fMRI study using a variation of the double-step saccade paradigm. Two double-step conditions were used: one enabled the second saccade to be partially programmed in parallel with the first saccade while in a second condition both saccades had to be prepared serially. The intersaccadic interval, observed in the parallel programming (PP) condition, was significantly reduced compared with latency in the serial programming (SP) condition and also to the latency of single saccades in control conditions. The fMRI analysis revealed greater activity (BOLD response) in the frontal and parietal eye fields for the PP condition compared with the SP double-step condition and when compared with the single-saccade control conditions. By contrast, activity in the supplementary eye fields was greater for the double-step condition than the single-step condition but did not distinguish between the PP and SP requirements. The role of the frontal eye fields in PP may be related to the advanced temporal preparation and increased salience of the second saccade goal that may mediate activity in other downstream structures, such as the superior colliculus. The parietal lobes may be involved in the preparation for spatial remapping, which is required in double-step conditions. The supplementary eye fields appear to have a more general role in planning saccade sequences that may be related to error monitoring and the control over the execution of the correct sequence of responses.
Toward a new paradigm of DNA writing using a massively parallel sequencing platform and degenerate oligonucleotide

PubMed Central

Hwang, Byungjin; Bang, Duhee

2016-01-01

All synthetic DNA materials require prior programming of the building blocks of the oligonucleotide sequences. The development of a programmable microarray platform provides cost-effective and time-efficient solutions in the field of data storage using DNA. However, the scalability of the synthesis is not on par with the accelerating sequencing capacity. Here, we report on a new paradigm of generating genetic material (writing) using a degenerate oligonucleotide and optomechanical retrieval method that leverages sequencing (reading) throughput to generate the desired number of oligonucleotides. As a proof of concept, we demonstrate the feasibility of our concept in digital information storage in DNA. In simulation, the ability to store data is expected to exponentially increase with increase in degenerate space. The present study highlights the major framework change in conventional DNA writing paradigm as a sequencer itself can become a potential source of making genetic materials. PMID:27876825
Toward a new paradigm of DNA writing using a massively parallel sequencing platform and degenerate oligonucleotide.

PubMed

Hwang, Byungjin; Bang, Duhee

2016-11-23

All synthetic DNA materials require prior programming of the building blocks of the oligonucleotide sequences. The development of a programmable microarray platform provides cost-effective and time-efficient solutions in the field of data storage using DNA. However, the scalability of the synthesis is not on par with the accelerating sequencing capacity. Here, we report on a new paradigm of generating genetic material (writing) using a degenerate oligonucleotide and optomechanical retrieval method that leverages sequencing (reading) throughput to generate the desired number of oligonucleotides. As a proof of concept, we demonstrate the feasibility of our concept in digital information storage in DNA. In simulation, the ability to store data is expected to exponentially increase with increase in degenerate space. The present study highlights the major framework change in conventional DNA writing paradigm as a sequencer itself can become a potential source of making genetic materials.
Software Assurance Challenges for the Commercial Crew Program

NASA Technical Reports Server (NTRS)

Cuyno, Patrick; Malnick, Kathy D.; Schaeffer, Chad E.

2015-01-01

This paper will provide a description of some of the challenges NASA is facing in providing software assurance within the new commercial space services paradigm, namely with the Commercial Crew Program (CCP). The CCP will establish safe, reliable, and affordable access to the International Space Station (ISS) by purchasing a ride from commercial companies. The CCP providers have varying experience with software development in safety-critical space systems. NASA's role in providing effective software assurance support to the CCP providers is critical to the success of CCP. These challenges include funding multiple vehicles that execute in parallel and have different rules of engagement, multiple providers with unique proprietary concerns, providing equivalent guidance to all providers, permitting alternates to NASA standards, and a large number of diverse stakeholders. It is expected that these challenges will exist in future programs, especially if the CCP paradigm proves successful. The proposed CCP approach to address these challenges includes a risk-based assessment with varying degrees of engagement and a distributed assurance model. This presentation will describe NASA IV&V Program's software assurance support and responses to these challenges.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver

NASA Astrophysics Data System (ADS)

Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre

2014-06-01

This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
chemf: A purely functional chemistry toolkit.

PubMed

Höck, Stefan; Riedl, Rainer

2012-12-20

Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project.
chemf: A purely functional chemistry toolkit

PubMed Central

2012-01-01

Background Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. Results We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. Conclusions We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project. PMID:23253942
Mentat: An object-oriented macro data flow system

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; Liu, Jane W. S.

1988-01-01

Mentat, an object-oriented macro data flow system designed to facilitate parallelism in distributed systems, is presented. The macro data flow model is a model of computation similar to the data flow model with two principal differences: the computational complexity of the actors is much greater than in traditional data flow systems, and there are persistent actors that maintain state information between executions. Mentat is a system that combines the object-oriented programming paradigm and the macro data flow model of computation. Mentat programs use a dynamic structure called a future list to represent the future of computations.
Case studies within a mixed methods paradigm: toward a resolution of the alienation between researcher and practitioner in psychotherapy research.

PubMed

Dattilio, Frank M; Edwards, David J A; Fishman, Daniel B

2010-12-01

This article addresses the long-standing divide between researchers and practitioners in the field of psychotherapy, regarding what really works in treatment and the extent to which interventions should be governed by outcomes generated in a "laboratory atmosphere." This alienation has its roots in a positivist paradigm, which is epistemologically incomplete because it fails to provide for context-based practical knowledge. In other fields of evaluation research, it has been superseded by a mixed methods paradigm, which embraces pragmatism and multiplicity. On the basis of this paradigm, we propose and illustrate new scientific standards for research on the evaluation of psychotherapeutic treatments. These include the requirement that projects should comprise several parallel studies that involve randomized controlled trials, qualitative examinations of the implementation of treatment programs, and systematic case studies. The uniqueness of this article is that it contributes a guideline for involving a set of complementary publications, including a review that offers an overall synthesis of the findings from different methodological approaches. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Demeure, I.M.

The research presented here is concerned with representation techniques and tools to support the design, prototyping, simulation, and evaluation of message-based parallel, distributed computations. The author describes ParaDiGM-Parallel, Distributed computation Graph Model-a visual representation technique for parallel, message-based distributed computations. ParaDiGM provides several views of a computation depending on the aspect of concern. It is made of two complementary submodels, the DCPG-Distributed Computing Precedence Graph-model, and the PAM-Process Architecture Model-model. DCPGs are precedence graphs used to express the functionality of a computation in terms of tasks, message-passing, and data. PAM graphs are used to represent the partitioning of a computationmore » into schedulable units or processes, and the pattern of communication among those units. There is a natural mapping between the two models. He illustrates the utility of ParaDiGM as a representation technique by applying it to various computations (e.g., an adaptive global optimization algorithm, the client-server model). ParaDiGM representations are concise. They can be used in documenting the design and the implementation of parallel, distributed computations, in describing such computations to colleagues, and in comparing and contrasting various implementations of the same computation. He then describes VISA-VISual Assistant, a software tool to support the design, prototyping, and simulation of message-based parallel, distributed computations. VISA is based on the ParaDiGM model. In particular, it supports the editing of ParaDiGM graphs to describe the computations of interest, and the animation of these graphs to provide visual feedback during simulations. The graphs are supplemented with various attributes, simulation parameters, and interpretations which are procedures that can be executed by VISA.« less

Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

PubMed

Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

2015-01-01

Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
Performance of OVERFLOW-D Applications based on Hybrid and MPI Paradigms on IBM Power4 System

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biegel, Bryan (Technical Monitor)

2002-01-01

This report briefly discusses our preliminary performance experiments with parallel versions of OVERFLOW-D applications. These applications are based on MPI and hybrid paradigms on the IBM Power4 system here at the NAS Division. This work is part of an effort to determine the suitability of the system and its parallel libraries (MPI/OpenMP) for specific scientific computing objectives.
Proceedings of the workshop on Compilation of (Symbolic) Languages for Parallel Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Tick, E.

1991-11-01

This report comprises the abstracts and papers for the talks presented at the Workshop on Compilation of (Symbolic) Languages for Parallel Computers, held October 31--November 1, 1991, in San Diego. These unreferred contributions were provided by the participants for the purpose of this workshop; many of them will be published elsewhere in peer-reviewed conferences and publications. Our goal is planning this workshop was to bring together researchers from different disciplines with common problems in compilation. In particular, we wished to encourage interaction between researchers working in compilation of symbolic languages and those working on compilation of conventional, imperative languages. Themore » fundamental problems facing researchers interested in compilation of logic, functional, and procedural programming languages for parallel computers are essentially the same. However, differences in the basic programming paradigms have led to different communities emphasizing different species of the parallel compilation problem. For example, parallel logic and functional languages provide dataflow-like formalisms in which control dependencies are unimportant. Hence, a major focus of research in compilation has been on techniques that try to infer when sequential control flow can safely be imposed. Granularity analysis for scheduling is a related problem. The single- assignment property leads to a need for analysis of memory use in order to detect opportunities for reuse. Much of the work in each of these areas relies on the use of abstract interpretation techniques.« less
Paradigms and strategies for scientific computing on distributed memory concurrent computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.T.; Walker, D.W.

1994-06-01

In this work we examine recent advances in parallel languages and abstractions that have the potential for improving the programmability and maintainability of large-scale, parallel, scientific applications running on high performance architectures and networks. This paper focuses on Fortran M, a set of extensions to Fortran 77 that supports the modular design of message-passing programs. We describe the Fortran M implementation of a particle-in-cell (PIC) plasma simulation application, and discuss issues in the optimization of the code. The use of two other methodologies for parallelizing the PIC application are considered. The first is based on the shared object abstraction asmore » embodied in the Orca language. The second approach is the Split-C language. In Fortran M, Orca, and Split-C the ability of the programmer to control the granularity of communication is important is designing an efficient implementation.« less
Parallel Ada benchmarks for the SVMS

NASA Technical Reports Server (NTRS)

Collard, Philippe E.

1990-01-01

The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.
Implementation of an Object-Oriented Flight Simulator D.C. Electrical System on a Hypercube Architecture

DTIC Science & Technology

1991-12-01

abstract data type is, what an object-oriented design is and how to apply "software engineering" principles to the design of both of them. I owe a great... Program (ASVP), a research and development effort by two aerospace contractors to redesign and implement subsets of two existing flight simulators in...effort addresses how to implement a simulator designed using the SEI OOD Paradigm on a distributed, parallel, multiple instruction, multiple data (MIMD
Parallel dispatch: a new paradigm of electrical power system dispatch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang

Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids

DOE PAGES

Xia, Yidong; Luo, Hong; Frisbey, Megan; ...

2014-07-01

A set of implicit methods are proposed for a third-order hierarchical WENO reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids. An attractive feature in these methods are the application of the Jacobian matrix based on the P1 element approximation, resulting in a huge reduction of memory requirement compared with DG (P2). Also, three approaches -- analytical derivation, divided differencing, and automatic differentiation (AD) are presented to construct the Jacobian matrix respectively, where the AD approach shows the best robustness. A variety of compressible flow problems are computed to demonstrate the fast convergence property of the implemented flowmore » solver. Furthermore, an SPMD (single program, multiple data) programming paradigm based on MPI is proposed to achieve parallelism. The numerical results on complex geometries indicate that this low-storage implicit method can provide a viable and attractive DG solution for complicated flows of practical importance.« less
Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, J; Ma, X; Singh, K

2008-10-09

With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generatemore » performance models for all tasks and then applying those models to perform automatic performance prediction across program executions. We also extend an existing scheduling algorithm to use generated task cost estimates for online task partitioning and scheduling. We implement the above techniques in the pR framework, which transparently parallelizes scripts in the popular R language, and evaluate their performance and overhead with both a real-world application and a large number of synthetic representative test scripts. Our experimental results show that our proposed approach significantly improves task partitioning and scheduling, with maximum improvements of 21.8%, 40.3% and 22.1% and average improvements of 15.9%, 16.9% and 4.2% for LMM (a real R application) and synthetic test cases with independent and dependent tasks, respectively.« less
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness

NASA Astrophysics Data System (ADS)

Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.

2014-09-01

Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Automatic selection of dynamic data partitioning schemes for distributed memory multicomputers

NASA Technical Reports Server (NTRS)

Palermo, Daniel J.; Banerjee, Prithviraj

1995-01-01

For distributed memory multicomputers such as the Intel Paragon, the IBM SP-2, the NCUBE/2, and the Thinking Machines CM-5, the quality of the data partitioning for a given application is crucial to obtaining high performance. This task has traditionally been the user's responsibility, but in recent years much effort has been directed to automating the selection of data partitioning schemes. Several researchers have proposed systems that are able to produce data distributions that remain in effect for the entire execution of an application. For complex programs, however, such static data distributions may be insufficient to obtain acceptable performance. The selection of distributions that dynamically change over the course of a program's execution adds another dimension to the data partitioning problem. In this paper, we present a technique that can be used to automatically determine which partitionings are most beneficial over specific sections of a program while taking into account the added overhead of performing redistribution. This system is being built as part of the PARADIGM (PARAllelizing compiler for DIstributed memory General-purpose Multicomputers) project at the University of Illinois. The complete system will provide a fully automated means to parallelize programs written in a serial programming model obtaining high performance on a wide range of distributed-memory multicomputers.
[Role of Hormones in Perinatal and Early Postnatal Development: Possible Contribution to Programming/Imprinting Phenomena].

PubMed

Goudochnikov, V I

2015-01-01

In parallel to formulating the paradigm of developmental origins of health and disease (DOHaD), the search began on mechanisms of programming/imprinting in ontogeny. Some recent evidence has revealed the important role of glucocorticoids in such mechanisms. However, in the last decades numerous data have been accumulated on participation of other hormones in developmental bioregulation. In present article we analyse these data, as referred to melatonin, but also to neuroactive steroids, somatolactogens and related peptides: insulin-like growth factor of type I (IGF-I) and oxytocin, i.e. peptide regulators related to growth and lactation respectively. Special attention was devoted to the evidence of glucocorticoid interactions with some of these hormones.
Scaling NASA Applications to 1024 CPUs on Origin 3K

NASA Technical Reports Server (NTRS)

Taft, Jim

2002-01-01

The long and highly successful joint SGI-NASA research effort in ever larger SSI systems was to a large degree the result of the successful development of the MLP scalable parallel programming paradigm developed at ARC: 1) MLP scaling in real production codes justified ever larger systems at NAS; 2) MLP scaling on 256p Origin 2000 gave SGl impetus to productize 256p; 3) MLP scaling on 512 gave SGI courage to build 1024p O3K; and 4) History of MLP success resulted in IBM Star Cluster based MLP effort.
Proxy-equation paradigm: A strategy for massively parallel asynchronous computations

NASA Astrophysics Data System (ADS)

Mittal, Ankita; Girimaji, Sharath

2017-09-01

Massively parallel simulations of transport equation systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs), which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence. In this paper, we propose and develop a novel "proxy equation" concept for a general transport equation that (i) tolerates asynchrony with minimal added error, (ii) preserves convergence order and thus, (iii) expected to scale efficiently on massively parallel machines. The central idea is to modify a priori the transport equation at the PE boundaries to offset asynchrony errors. Proof-of-concept computations are performed using a one-dimensional advection (convection) diffusion equation. The results demonstrate the promise and advantages of the present strategy.
Hyperswitch communication network

NASA Technical Reports Server (NTRS)

Peterson, J.; Pniel, M.; Upchurch, E.

1991-01-01

The Hyperswitch Communication Network (HCN) is a large scale parallel computer prototype being developed at JPL. Commercial versions of the HCN computer are planned. The HCN computer being designed is a message passing multiple instruction multiple data (MIMD) computer, and offers many advantages in price-performance ratio, reliability and availability, and manufacturing over traditional uniprocessors and bus based multiprocessors. The design of the HCN operating system is a uniquely flexible environment that combines both parallel processing and distributed processing. This programming paradigm can achieve a balance among the following competing factors: performance in processing and communications, user friendliness, and fault tolerance. The prototype is being designed to accommodate a maximum of 64 state of the art microprocessors. The HCN is classified as a distributed supercomputer. The HCN system is described, and the performance/cost analysis and other competing factors within the system design are reviewed.
Kappa and Rater Accuracy: Paradigms and Parameters

ERIC Educational Resources Information Center

Conger, Anthony J.

2017-01-01

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Critical role of cerebellar fastigial nucleus in programming sequences of saccades

PubMed Central

King, Susan A.; Schneider, Rosalyn M.; Serra, Alessandro; Leigh, R. John

2011-01-01

The cerebellum plays an important role in programming accurate saccades. Cerebellar lesions affecting the ocular motor region of the fastigial nucleus (FOR) cause saccadic hypermetria; however, if a second target is presented before a saccade can be initiated (double-step paradigm), saccade hypermetria may be decreased. We tested the hypothesis that the cerebellum, especially FOR, plays a pivotal role in programming sequences of saccades. We studied patients with saccadic hypermetria due either to genetic cerebellar ataxia or surgical lesions affecting FOR and confirmed that the gain of initial saccades made to double-step stimuli was reduced compared with the gain of saccades to single target jumps. Based on measurements of the intersaccadic interval, we found that the ability to perform parallel processing of saccades was reduced or absent in all of our patients with cerebellar disease. Our results support the crucial role of the cerebellum, especially FOR, in programming sequences of saccades. PMID:21950988
A DNA network as an information processing system.

PubMed

Santini, Cristina Costa; Bath, Jonathan; Turberfield, Andrew J; Tyrrell, Andy M

2012-01-01

Biomolecular systems that can process information are sought for computational applications, because of their potential for parallelism and miniaturization and because their biocompatibility also makes them suitable for future biomedical applications. DNA has been used to design machines, motors, finite automata, logic gates, reaction networks and logic programs, amongst many other structures and dynamic behaviours. Here we design and program a synthetic DNA network to implement computational paradigms abstracted from cellular regulatory networks. These show information processing properties that are desirable in artificial, engineered molecular systems, including robustness of the output in relation to different sources of variation. We show the results of numerical simulations of the dynamic behaviour of the network and preliminary experimental analysis of its main components.
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Communication Studies of DMP and SMP Machines

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.

High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy.

PubMed

Samant, Sanjiv S; Xia, Junyi; Muyan-Ozcelik, Pinar; Owens, John D

2008-08-01

The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.
Evaluating perceptual integration: uniting response-time- and accuracy-based methodologies.

PubMed

Eidels, Ami; Townsend, James T; Hughes, Howard C; Perry, Lacey A

2015-02-01

This investigation brings together a response-time system identification methodology (e.g., Townsend & Wenger Psychonomic Bulletin & Review 11, 391-418, 2004a) and an accuracy methodology, intended to assess models of integration across stimulus dimensions (features, modalities, etc.) that were proposed by Shaw and colleagues (e.g., Mulligan & Shaw Perception & Psychophysics 28, 471-478, 1980). The goal was to theoretically examine these separate strategies and to apply them conjointly to the same set of participants. The empirical phases were carried out within an extension of an established experimental design called the double factorial paradigm (e.g., Townsend & Nozawa Journal of Mathematical Psychology 39, 321-359, 1995). That paradigm, based on response times, permits assessments of architecture (parallel vs. serial processing), stopping rule (exhaustive vs. minimum time), and workload capacity, all within the same blocks of trials. The paradigm introduced by Shaw and colleagues uses a statistic formally analogous to that of the double factorial paradigm, but based on accuracy rather than response times. We demonstrate that the accuracy measure cannot discriminate between parallel and serial processing. Nonetheless, the class of models supported by the accuracy data possesses a suitable interpretation within the same set of models supported by the response-time data. The supported model, consistent across individuals, is parallel and has limited capacity, with the participants employing the appropriate stopping rule for the experimental setting.
A new paradigm (Westphal-Paradigm) to study the neural correlates of panic disorder with agoraphobia.

PubMed

Wittmann, A; Schlagenhauf, F; John, T; Guhn, A; Rehbein, H; Siegmund, A; Stoy, M; Held, D; Schulz, I; Fehm, L; Fydrich, T; Heinz, A; Bruhn, H; Ströhle, A

2011-04-01

Agoraphobia (with and without panic disorder) is a highly prevalent and disabling anxiety disorder. Its neural complexity can be characterized by specific cues in fMRI studies. Therefore, we developed a fMRI paradigm with agoraphobia-specific stimuli. Pictures of potential agoraphobic situations were generated. Twenty-six patients, suffering from panic disorder and agoraphobia, and 22 healthy controls rated the pictures with respect to arousal, valence, and agoraphobia-related anxiety. The 96 pictures, which discriminated best between groups were chosen, split into two parallel sets and supplemented with matched neutral pictures from the International Affective Picture System. Reliability, criterion, and construct validity of the picture set were determined in a second sample (44 patients, 28 controls). The resulting event-related "Westphal-Paradigm" with cued and uncued pictures was tested in a fMRI pilot study with 16 patients. Internal consistency of the sets was very high; parallelism was given. Positive correlations of picture ratings with Mobility Inventory and Hamilton anxiety scores support construct validity. FMRI data revealed activations in areas associated with the fear circuit including amygdala, insula, and hippocampal areas. Psychometric properties of the Westphal-Paradigm meet necessary quality requirements for further scientific use. The paradigm reliably produces behavioral and fMRI patterns in response to agoraphobia-specific stimuli. To our knowledge, it is the first fMRI paradigm with these properties. This paradigm can be used to further characterize the functional neuroanatomy of panic disorder and agoraphobia and might be useful to contribute data to the differentiation of panic disorder and agoraphobia as related, but conceptually different clinical disorders.
Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

NASA Astrophysics Data System (ADS)

Xing, F.; Masson, R.; Lopez, S.

2017-09-01

This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.
Accelerated Adaptive MGS Phase Retrieval

NASA Technical Reports Server (NTRS)

Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang

2011-01-01

The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Multiple Independent File Parallel I/O with HDF5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M. C.

2016-07-13

The HDF5 library has supported the I/O requirements of HPC codes at Lawrence Livermore National Labs (LLNL) since the late 90’s. In particular, HDF5 used in the Multiple Independent File (MIF) parallel I/O paradigm has supported LLNL code’s scalable I/O requirements and has recently been gainfully used at scales as large as O(10 6) parallel tasks.
Exploring the Educational Benefits of Introducing Aspect-Oriented Programming Into a Programming Course

ERIC Educational Resources Information Center

Boticki, I.; Katic, M.; Martin,S.

2013-01-01

This paper explores the educational benefits of introducing the aspect-oriented programming paradigm into a programming course in a study on a sample of 75 undergraduate software engineering students. It discusses how using the aspect-oriented paradigm, in addition to the object-oriented programming paradigm, affects students' programs, their exam…
Fully parallel write/read in resistive synaptic array for accelerating on-chip learning

NASA Astrophysics Data System (ADS)

Gao, Ligang; Wang, I.-Ting; Chen, Pai-Yu; Vrudhula, Sarma; Seo, Jae-sun; Cao, Yu; Hou, Tuo-Hung; Yu, Shimeng

2015-11-01

A neuro-inspired computing paradigm beyond the von Neumann architecture is emerging and it generally takes advantage of massive parallelism and is aimed at complex tasks that involve intelligence and learning. The cross-point array architecture with synaptic devices has been proposed for on-chip implementation of the weighted sum and weight update in the learning algorithms. In this work, forming-free, silicon-process-compatible Ta/TaO x /TiO2/Ti synaptic devices are fabricated, in which >200 levels of conductance states could be continuously tuned by identical programming pulses. In order to demonstrate the advantages of parallelism of the cross-point array architecture, a novel fully parallel write scheme is designed and experimentally demonstrated in a small-scale crossbar array to accelerate the weight update in the training process, at a speed that is independent of the array size. Compared to the conventional row-by-row write scheme, it achieves >30× speed-up and >30× improvement in energy efficiency as projected in a large-scale array. If realistic synaptic device characteristics such as device variations are taken into an array-level simulation, the proposed array architecture is able to achieve ∼95% recognition accuracy of MNIST handwritten digits, which is close to the accuracy achieved by software using the ideal sparse coding algorithm.
Processing Solutions for Big Data in Astronomy

NASA Astrophysics Data System (ADS)

Fillatre, L.; Lepiller, D.

2016-09-01

This paper gives a simple introduction to processing solutions applied to massive amounts of data. It proposes a general presentation of the Big Data paradigm. The Hadoop framework, which is considered as the pioneering processing solution for Big Data, is described together with YARN, the integrated Hadoop tool for resource allocation. This paper also presents the main tools for the management of both the storage (NoSQL solutions) and computing capacities (MapReduce parallel processing schema) of a cluster of machines. Finally, more recent processing solutions like Spark are discussed. Big Data frameworks are now able to run complex applications while keeping the programming simple and greatly improving the computing speed.
Exploiting variability for energy optimization of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lavrijsen, Wim; Iancu, Costin; de Jong, Wibe

2016-04-18

Here in this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes withmore » highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost.« less
Critical role of cerebellar fastigial nucleus in programming sequences of saccades.

PubMed

King, Susan A; Schneider, Rosalyn M; Serra, Alessandro; Leigh, R John

2011-09-01

The cerebellum plays an important role in programming accurate saccades. Cerebellar lesions affecting the ocular motor region of the fastigial nucleus (FOR) cause saccadic hypermetria; however, if a second target is presented before a saccade can be initiated (double-step paradigm), saccade hypermetria may be decreased. We tested the hypothesis that the cerebellum, especially FOR, plays a pivotal role in programming sequences of saccades. We studied patients with saccadic hypermetria because of either genetic cerebellar ataxia or surgical lesions affecting FOR and confirmed that the gain of initial saccades made to double-step stimuli was reduced compared with the gain of saccades to single target jumps. Based on measurements of the intersaccadic interval, we found that the ability to perform parallel processing of saccades was reduced or absent in all of our patients with cerebellar disease. Our results support the crucial role of the cerebellum, especially FOR, in programming sequences of saccades. © 2011 New York Academy of Sciences.
Writing Parallel Parameter Sweep Applications with pMatlab

DTIC Science & Technology

2011-01-01

formulate this type of problem in a leader-worker paradigm. The SETI @Home project is a well- known leader-worker parallel application [1]. The SETI ...their results back to the SETI @Home servers when they are done computing the job. Because each job is independent, it does not matter if the 415th job
A portable approach for PIC on emerging architectures

NASA Astrophysics Data System (ADS)

Decyk, Viktor

2016-03-01

A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
Nonlinear Real-Time Optical Signal Processing

DTIC Science & Technology

1990-09-01

pattern recognition. Additional work concerns the relationship of parallel computation paradigms to optical computing and halftone screen techniques...paradigms to optical computing and halftone screen techniques for implementing general nonlinear functions. 3\\ 2 Research Progress This section...Vol. 23, No. 8, pp. 34-57, 1986. 2.4 Nonlinear Optical Processing with Halftones : Degradation and Compen- sation Models This paper is concerned with
Standardization and validation of a parallel form of the verbal and non-verbal recognition memory test in an Italian population sample.

PubMed

Smirni, Daniela; Smirni, Pietro; Di Martino, Giovanni; Cipolotti, Lisa; Oliveri, Massimiliano; Turriziani, Patrizia

2018-05-04

In the neuropsychological assessment of several neurological conditions, recognition memory evaluation is requested. Recognition seems to be more appropriate than recall to study verbal and non-verbal memory, because interferences of psychological and emotional disorders are less relevant in the recognition than they are in recall memory paradigms. In many neurological disorders, longitudinal repeated assessments are needed to monitor the effectiveness of rehabilitation programs or pharmacological treatments on the recovery of memory. In order to contain the practice effect in repeated neuropsychological evaluations, it is necessary the use of parallel forms of the tests. Having two parallel forms of the same test, that kept administration procedures and scoring constant, is a great advantage in both clinical practice, for the monitoring of memory disorder, and in experimental practice, to allow the repeated evaluation of memory on healthy and neurological subjects. First aim of the present study was to provide normative values in an Italian sample (n = 160) for a parallel form of a verbal and non-verbal recognition memory battery. Multiple regression analysis revealed significant effects of age and education on recognition memory performance, whereas sex did not reach a significant probability level. Inferential cutoffs have been determined and equivalent scores computed. Secondly, the study aimed to validate the equivalence of the two parallel forms of the Recognition Memory Test. The correlations analyses between the total scores of the two versions of the test and correlation between the three subtasks revealed that the two forms are parallel and the subtasks are equivalent for difficulty.
A Parallel Vector Machine for the PM Programming Language

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2016-04-01

PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
Scalable DB+IR Technology: Processing Probabilistic Datalog with HySpirit.

PubMed

Frommholz, Ingo; Roelleke, Thomas

2016-01-01

Probabilistic Datalog (PDatalog, proposed in 1995) is a probabilistic variant of Datalog and a nice conceptual idea to model Information Retrieval in a logical, rule-based programming paradigm. Making PDatalog work in real-world applications requires more than probabilistic facts and rules, and the semantics associated with the evaluation of the programs. We report in this paper some of the key features of the HySpirit system required to scale the execution of PDatalog programs. Firstly, there is the requirement to express probability estimation in PDatalog. Secondly, fuzzy-like predicates are required to model vague predicates (e.g. vague match of attributes such as age or price). Thirdly, to handle large data sets there are scalability issues to be addressed, and therefore, HySpirit provides probabilistic relational indexes and parallel and distributed processing . The main contribution of this paper is a consolidated view on the methods of the HySpirit system to make PDatalog applicable in real-scale applications that involve a wide range of requirements typical for data (information) management and analysis.
Parallel Object Activation and Attentional Gating of Information: Evidence from Eye Movements in the Multiple Object Naming Paradigm

ERIC Educational Resources Information Center

Schotter, Elizabeth R.; Ferreira, Victor S.; Rayner, Keith

2013-01-01

Do we access information from any object we can see, or do we access information only from objects that we intend to name? In 3 experiments using a modified multiple object naming paradigm, subjects were required to name several objects in succession when previews appeared briefly and simultaneously in the same location as the target as well as at…
The impact of sterile neutrinos on CP measurements at long baselines

DOE PAGES

Gandhi, Raj; Kayser, Boris; Masud, Mehedi; ...

2015-09-01

With the Deep Underground Neutrino Experiment (DUNE) as an example, we show that the presence of even one sterile neutrino of mass ~1 eV can significantly impact the measurements of CP violation in long baseline experiments. Using a probability level analysis and neutrino-antineutrino asymmetry calculations, we discuss the large magnitude of these effects, and show how they translate into significant event rate deviations at DUNE. These results demonstrate that measurements which, when interpreted in the context of the standard three family paradigm, indicate CP conservation at long baselines, may, in fact hide large CP violation if there is a sterilemore » state. Similarly, any data indicating the violation of CP cannot be properly interpreted within the standard paradigm unless the presence of sterile states of mass O(1 eV) can be conclusively ruled out. Our work underscores the need for a parallel and linked short baseline oscillation program and a highly capable near detector for DUNE, but in order that its highly anticipated results on CP violation in the lepton sector may be correctly interpreted.« less
The impact of sterile neutrinos on CP measurements at long baselines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gandhi, Raj; Kayser, Boris; Masud, Mehedi

With the Deep Underground Neutrino Experiment (DUNE) as an example, we show that the presence of even one sterile neutrino of mass ~1 eV can significantly impact the measurements of CP violation in long baseline experiments. Using a probability level analysis and neutrino-antineutrino asymmetry calculations, we discuss the large magnitude of these effects, and show how they translate into significant event rate deviations at DUNE. These results demonstrate that measurements which, when interpreted in the context of the standard three family paradigm, indicate CP conservation at long baselines, may, in fact hide large CP violation if there is a sterilemore » state. Similarly, any data indicating the violation of CP cannot be properly interpreted within the standard paradigm unless the presence of sterile states of mass O(1 eV) can be conclusively ruled out. Our work underscores the need for a parallel and linked short baseline oscillation program and a highly capable near detector for DUNE, but in order that its highly anticipated results on CP violation in the lepton sector may be correctly interpreted.« less

Fortran for the nineties

NASA Technical Reports Server (NTRS)

Himer, J. T.

1992-01-01

Fortran has largely enjoyed prominence for the past few decades as the computer programming language of choice for numerically intensive scientific, engineering, and process control applications. Fortran's well understood static language syntax has allowed resulting parsers and compiler optimizing technologies to often generate among the most efficient and fastest run-time executables, particularly on high-end scalar and vector supercomputers. Computing architectures and paradigms have changed considerably since the last ANSI/ISO Fortran release in 1978, and while FORTRAN 77 has more than survived, it's aged features provide only partial functionality for today's demanding computing environments. The simple block procedural languages have been necessarily evolving, or giving way, to specialized supercomputing, network resource, and object-oriented paradigms. To address these new computing demands, ANSI has worked for the last 12-years with three international public reviews to deliver Fortran 90. Fortran 90 has superseded and replaced ISO FORTRAN 77 internationally as the sole Fortran standard; while in the US, Fortran 90 is expected to be adopted as the ANSI standard this summer, coexisting with ANSI FORTRAN 77 until at least 1996. The development path and current state of Fortran will be briefly described highlighting the many new Fortran 90 syntactic and semantic additions which support (among others): free form source; array syntax; new control structures; modules and interfaces; pointers; derived data types; dynamic memory; enhanced I/O; operator overloading; data abstraction; user optional arguments; new intrinsics for array, bit manipulation, and system inquiry; and enhanced portability through better generic control of underlying system arithmetic models. Examples from dynamical astronomy, signal and image processing will attempt to illustrate Fortran 90's applicability to today's general scalar, vector, and parallel scientific and engineering requirements and object oriented programming paradigms. Time permitting, current work proceeding on the future development of Fortran 2000 and collateral standards will be introduced.
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

PubMed Central

2010-01-01

Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.

PubMed

Taylor, Ronald C

2010-12-21

Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.
Message Passing vs. Shared Address Space on a Cluster of SMPs

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

2000-01-01

The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Convergent paradigms for visual neuroscience and dissociative identity disorder.

PubMed

Manning, Mark L; Manning, Rana L

2009-01-01

Although dissociative identity disorder, a condition in which multiple individuals appear to inhabit a single body, is a recognized psychiatric disorder, patients may yet encounter health professionals who declare that they simply "do not believe in multiple personalities." This article explores the proposal that resistance to the disorder represents a failure to apply an appropriate paradigm from which the disorder should be interpreted. Trauma and sociocognitive explanations of dissociative identity disorder are contrasted. The trauma hypothesis is further differentiated into paradigms in which trauma affects a defense mechanism, and one in which trauma serves to inhibit the normal integration sequence of parallel processes of the self in childhood. This latter paradigm is shown to be broadly consistent with current models of cortical processing in another system, the cortical visual system.
New computing systems, future computing environment, and their implications on structural analysis and design

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.; Housner, Jerrold M.

1993-01-01

Recent advances in computer technology that are likely to impact structural analysis and design of flight vehicles are reviewed. A brief summary is given of the advances in microelectronics, networking technologies, and in the user-interface hardware and software. The major features of new and projected computing systems, including high performance computers, parallel processing machines, and small systems, are described. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed. The impact of the advances in computer technology on structural analysis and the design of flight vehicles is described. A scenario for future computing paradigms is presented, and the near-term needs in the computational structures area are outlined.
Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design

NASA Technical Reports Server (NTRS)

Krasteva, Denitza T.

1998-01-01

Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
The Neurological Basis and Potential Modification of Emotional Intelligence through Affective/Behavioral Training

DTIC Science & Technology

2010-10-01

facial trustworthiness; facial displays of anger) presented subliminally . Furthermore, the responsiveness of these regions to subliminal stimulation ...develop, or program the computerized stimulation paradigms for use during functional neuroimaging (i.e., MJT; BMAT; EFAT). These paradigms will be...programming began on the computerized functional MRI stimulation paradigms using e-prime software. • Quarter #2: Programming of all computerized functional
In-situ Isotopic Analysis at Nanoscale using Parallel Ion Electron Spectrometry: A Powerful New Paradigm for Correlative Microscopy

NASA Astrophysics Data System (ADS)

Yedra, Lluís; Eswara, Santhana; Dowsett, David; Wirtz, Tom

2016-06-01

Isotopic analysis is of paramount importance across the entire gamut of scientific research. To advance the frontiers of knowledge, a technique for nanoscale isotopic analysis is indispensable. Secondary Ion Mass Spectrometry (SIMS) is a well-established technique for analyzing isotopes, but its spatial-resolution is fundamentally limited. Transmission Electron Microscopy (TEM) is a well-known method for high-resolution imaging down to the atomic scale. However, isotopic analysis in TEM is not possible. Here, we introduce a powerful new paradigm for in-situ correlative microscopy called the Parallel Ion Electron Spectrometry by synergizing SIMS with TEM. We demonstrate this technique by distinguishing lithium carbonate nanoparticles according to the isotopic label of lithium, viz. 6Li and 7Li and imaging them at high-resolution by TEM, adding a new dimension to correlative microscopy.
A Specification for a Godunov-type Eulerian 2-D Hydrocode, Revision 0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nystrom, William D; Robey, Jonathan M

2012-05-01

The purpose of this code specification is to describe an algorithm for solving the Euler equations of hydrodynamics in a 2D rectangular region in sufficient detail to allow a software developer to produce an implementation on their target platform using their programming language of choice without requiring detailed knowledge and experience in the field of computational fluid dynamics. It should be possible for a software developer who is proficient in the programming language of choice and is knowledgable of the target hardware to produce an efficient implementation of this specification if they also possess a thorough working knowledge of parallelmore » programming and have some experience in scientific programming using fields and meshes. On modern architectures, it will be important to focus on issues related to the exploitation of the fine grain parallelism and data locality present in this algorithm. This specification aims to make that task easier by presenting the essential details of the algorithm in a systematic and language neutral manner while also avoiding the inclusion of implementation details that would likely be specific to a particular type of programming paradigm or platform architecture.« less
JANUS: A Compilation System for Balancing Parallelism and Performance in OpenVX

NASA Astrophysics Data System (ADS)

Omidian, Hossein; Lemieux, Guy G. F.

2018-04-01

Embedded systems typically do not have enough on-chip memory for entire an image buffer. Programming systems like OpenCV operate on entire image frames at each step, making them use excessive memory bandwidth and power. In contrast, the paradigm used by OpenVX is much more efficient; it uses image tiling, and the compilation system is allowed to analyze and optimize the operation sequence, specified as a compute graph, before doing any pixel processing. In this work, we are building a compilation system for OpenVX that can analyze and optimize the compute graph to take advantage of parallel resources in many-core systems or FPGAs. Using a database of prewritten OpenVX kernels, it automatically adjusts the image tile size as well as using kernel duplication and coalescing to meet a defined area (resource) target, or to meet a specified throughput target. This allows a single compute graph to target implementations with a wide range of performance needs or capabilities, e.g. from handheld to datacenter, that use minimal resources and power to reach the performance target.
Microworlds for Learning Object-Oriented Programming: Considerations from Research to Practice

ERIC Educational Resources Information Center

Djelil, Fahima; Albouy-Kissi, Adelaide; Albouy-Kissi, Benjamin; Sanchez, Eric; Lavest, Jean-Marc

2016-01-01

Object-Oriented paradigm is a common paradigm for introductory programming courses. However, many teachers find that transitioning to teaching this paradigm is a difficult task. To overcome this complexity, many experienced teachers use microworlds to give beginner students an intuitive and rapid understanding of fundamental abstract concepts of…
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Converging Paradigms: A Reflection on Parallel Theoretical Developments in Psychoanalytic Metapsychology and Empirical Dream Research.

PubMed

Schmelowszky, Ágoston

2016-08-01

In the last decades one can perceive a striking parallelism between the shifting perspective of leading representatives of empirical dream research concerning their conceptualization of dreaming and the paradigm shift within clinically based psychoanalytic metapsychology with respect to its theory on the significance of dreaming. In metapsychology, dreaming becomes more and more a central metaphor of mental functioning in general. The theories of Klein, Bion, and Matte-Blanco can be considered as milestones of this paradigm shift. In empirical dream research, the competing theories of Hobson and of Solms respectively argued for and against the meaningfulness of the dream-work in the functioning of the mind. In the meantime, empirical data coming from various sources seemed to prove the significance of dream consciousness for the development and maintenance of adaptive waking consciousness. Metapsychological speculations and hypotheses based on empirical research data seem to point in the same direction, promising for contemporary psychoanalytic practice a more secure theoretical base. In this paper the author brings together these diverse theoretical developments and presents conclusions regarding psychoanalytic theory and technique, as well as proposing an outline of an empirical research plan for testing the specificity of psychoanalysis in developing dream formation.
Separating stages of arithmetic verification: An ERP study with a novel paradigm.

PubMed

Avancini, Chiara; Soltész, Fruzsina; Szűcs, Dénes

2015-08-01

In studies of arithmetic verification, participants typically encounter two operands and they carry out an operation on these (e.g. adding them). Operands are followed by a proposed answer and participants decide whether this answer is correct or incorrect. However, interpretation of results is difficult because multiple parallel, temporally overlapping numerical and non-numerical processes of the human brain may contribute to task execution. In order to overcome this problem here we used a novel paradigm specifically designed to tease apart the overlapping cognitive processes active during arithmetic verification. Specifically, we aimed to separate effects related to detection of arithmetic correctness, detection of the violation of strategic expectations, detection of physical stimulus properties mismatch and numerical magnitude comparison (numerical distance effects). Arithmetic correctness, physical stimulus properties and magnitude information were not task-relevant properties of the stimuli. We distinguished between a series of temporally highly overlapping cognitive processes which in turn elicited overlapping ERP effects with distinct scalp topographies. We suggest that arithmetic verification relies on two major temporal phases which include parallel running processes. Our paradigm offers a new method for investigating specific arithmetic verification processes in detail. Copyright © 2015 Elsevier Ltd. All rights reserved.
Numerical methods on some structured matrix algebra problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1996-06-01

This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was tomore » translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.« less
GPU-completeness: theory and implications

NASA Astrophysics Data System (ADS)

Lin, I.-Jong

2011-01-01

This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Advances in Parallel Computing and Databases for Digital Pathology in Cancer Research

DTIC Science & Technology

2016-11-13

these technologies and how we have used them in the past. We are interested in learning more about the needs of clinical pathologists as we continue to...such as image processing and correlation. Further, High Performance Computing (HPC) paradigms such as the Message Passing Interface (MPI) have been...Defense for Research and Engineering. such as pMatlab [4], or bcMPI [5] can significantly reduce the need for deep knowledge of parallel computing. In
High-Performance Signal Detection for Adverse Drug Events using MapReduce Paradigm.

PubMed

Fan, Kai; Sun, Xingzhi; Tao, Ying; Xu, Linhao; Wang, Chen; Mao, Xianling; Peng, Bo; Pan, Yue

2010-11-13

Post-marketing pharmacovigilance is important for public health, as many Adverse Drug Events (ADEs) are unknown when those drugs were approved for marketing. However, due to the large number of reported drugs and drug combinations, detecting ADE signals by mining these reports is becoming a challenging task in terms of computational complexity. Recently, a parallel programming model, MapReduce has been introduced by Google to support large-scale data intensive applications. In this study, we proposed a MapReduce-based algorithm, for common ADE detection approach, Proportional Reporting Ratio (PRR), and tested it in mining spontaneous ADE reports from FDA. The purpose is to investigate the possibility of using MapReduce principle to speed up biomedical data mining tasks using this pharmacovigilance case as one specific example. The results demonstrated that MapReduce programming model could improve the performance of common signal detection algorithm for pharmacovigilance in a distributed computation environment at approximately liner speedup rates.

Feature Discovery by Competitive Learning.

ERIC Educational Resources Information Center

Rumelhart, David E.; Zipser, David

1985-01-01

Reports results of studies with an unsupervised learning paradigm called competitive learning which is examined using computer simulation and formal analysis. When competitive learning is applied to parallel networks of neuron-like elements, many potentially useful learning tasks can be accomplished. (Author)
schwimmbad: A uniform interface to parallel processing pools in Python

NASA Astrophysics Data System (ADS)

Price-Whelan, Adrian M.; Foreman-Mackey, Daniel

2017-09-01

Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).
Multidisciplinary Design Optimization (MDO) Methods: Their Synergy with Computer Technology in Design Process

NASA Technical Reports Server (NTRS)

Sobieszczanski-Sobieski, Jaroslaw

1998-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate a radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimization (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behavior by interaction of a large number of very simple models may be an inspiration for the above algorithms, the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should be now, even though the widespread availability of massively parallel processing is still a few years away.
Multidisciplinary Design Optimisation (MDO) Methods: Their Synergy with Computer Technology in the Design Process

NASA Technical Reports Server (NTRS)

Sobieszczanski-Sobieski, Jaroslaw

1999-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Low-Speed Investigation of Upper-Surface Leading-Edge Blowing on a High-Speed Civil Transport Configuration

NASA Technical Reports Server (NTRS)

Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.

1999-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Sociomoral Atmosphere in Direct-Instruction, Eclectic, and Constructivist Kindergartens: A Study of Teachers' Enacted Interpersonal Understanding.

ERIC Educational Resources Information Center

DeVries, Rheta; And Others

This study examined the interactions between teachers and children in three kindergarten classrooms. Programs used in the classrooms were: a direct-instruction (DI) program, representing a cultural transmission paradigm; a contructivist program (CON), representing the cognitive-developmental paradigm; and an eclectic program (ECL), combining…
An Object-Oriented Approach to Writing Computational Electromagnetics Codes

NASA Technical Reports Server (NTRS)

Zimmerman, Martin; Mallasch, Paul G.

1996-01-01

Presently, most computer software development in the Computational Electromagnetics (CEM) community employs the structured programming paradigm, particularly using the Fortran language. Other segments of the software community began switching to an Object-Oriented Programming (OOP) paradigm in recent years to help ease design and development of highly complex codes. This paper examines design of a time-domain numerical analysis CEM code using the OOP paradigm, comparing OOP code and structured programming code in terms of software maintenance, portability, flexibility, and speed.
Using the Statecharts paradigm for simulation of patient flow in surgical care.

PubMed

Sobolev, Boris; Harel, David; Vasilakis, Christos; Levy, Adrian

2008-03-01

Computer simulation of patient flow has been used extensively to assess the impacts of changes in the management of surgical care. However, little research is available on the utility of existing modeling techniques. The purpose of this paper is to examine the capacity of Statecharts, a system of graphical specification, for constructing a discrete-event simulation model of the perioperative process. The Statecharts specification paradigm was originally developed for representing reactive systems by extending the formalism of finite-state machines through notions of hierarchy, parallelism, and event broadcasting. Hierarchy permits subordination between states so that one state may contain other states. Parallelism permits more than one state to be active at any given time. Broadcasting of events allows one state to detect changes in another state. In the context of the peri-operative process, hierarchy provides the means to describe steps within activities and to cluster related activities, parallelism provides the means to specify concurrent activities, and event broadcasting provides the means to trigger a series of actions in one activity according to transitions that occur in another activity. Combined with hierarchy and parallelism, event broadcasting offers a convenient way to describe the interaction of concurrent activities. We applied the Statecharts formalism to describe the progress of individual patients through surgical care as a series of asynchronous updates in patient records generated in reaction to events produced by parallel finite-state machines representing concurrent clinical and managerial activities. We conclude that Statecharts capture successfully the behavioral aspects of surgical care delivery by specifying permissible chronology of events, conditions, and actions.
Parallel implementation of approximate atomistic models of the AMOEBA polarizable model

NASA Astrophysics Data System (ADS)

Demerdash, Omar; Head-Gordon, Teresa

2016-11-01

In this work we present a replicated data hybrid OpenMP/MPI implementation of a hierarchical progression of approximate classical polarizable models that yields speedups of up to ∼10 compared to the standard OpenMP implementation of the exact parent AMOEBA polarizable model. In addition, our parallel implementation exhibits reasonable weak and strong scaling. The resulting parallel software will prove useful for those who are interested in how molecular properties converge in the condensed phase with respect to the MBE, it provides a fruitful test bed for exploring different electrostatic embedding schemes, and offers an interesting possibility for future exascale computing paradigms.
Cost-estimating relationships for space programs

NASA Technical Reports Server (NTRS)

Mandell, Humboldt C., Jr.

1992-01-01

Cost-estimating relationships (CERs) are defined and discussed as they relate to the estimation of theoretical costs for space programs. The paper primarily addresses CERs based on analogous relationships between physical and performance parameters to estimate future costs. Analytical estimation principles are reviewed examining the sources of errors in cost models, and the use of CERs is shown to be affected by organizational culture. Two paradigms for cost estimation are set forth: (1) the Rand paradigm for single-culture single-system methods; and (2) the Price paradigms that incorporate a set of cultural variables. For space programs that are potentially subject to even small cultural changes, the Price paradigms are argued to be more effective. The derivation and use of accurate CERs is important for developing effective cost models to analyze the potential of a given space program.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
An Approach Using Parallel Architecture to Storage DICOM Images in Distributed File System

NASA Astrophysics Data System (ADS)

Soares, Tiago S.; Prado, Thiago C.; Dantas, M. A. R.; de Macedo, Douglas D. J.; Bauer, Michael A.

2012-02-01

Telemedicine is a very important area in medical field that is expanding daily motivated by many researchers interested in improving medical applications. In Brazil was started in 2005, in the State of Santa Catarina has a developed server called the CyclopsDCMServer, which the purpose to embrace the HDF for the manipulation of medical images (DICOM) using a distributed file system. Since then, many researches were initiated in order to seek better performance. Our approach for this server represents an additional parallel implementation in I/O operations since HDF version 5 has an essential feature for our work which supports parallel I/O, based upon the MPI paradigm. Early experiments using four parallel nodes, provide good performance when compare to the serial HDF implemented in the CyclopsDCMServer.
Distributed and parallel approach for handle and perform huge datasets

NASA Astrophysics Data System (ADS)

Konopko, Joanna

2015-12-01

Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.
A software architecture for multidisciplinary applications: Integrating task and data parallelism

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Vanrosendale, John; Zima, Hans

1994-01-01

Data parallel languages such as Vienna Fortran and HPF can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are of a multidisciplinary and heterogeneous nature and thus do not fit well into the data parallel paradigm. In this paper we present new Fortran 90 language extensions to fill this gap. Tasks can be spawned as asynchronous activities in a homogeneous or heterogeneous computing environment; they interact by sharing access to Shared Data Abstractions (SDA's). SDA's are an extension of Fortran 90 modules, representing a pool of common data, together with a set of Methods for controlled access to these data and a mechanism for providing persistent storage. Our language supports the integration of data and task parallelism as well as nested task parallelism and thus can be used to express multidisciplinary applications in a natural and efficient way.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Bilingual parallel programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Overbeek, R.

1990-01-01

Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less
Opus: A Coordination Language for Multidisciplinary Applications

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Haines, Matthew; Mehrotra, Piyush; Zima, Hans; vanRosendale, John

1997-01-01

Data parallel languages, such as High Performance fortran, can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are multidisciplinary and heterogeneous in nature, and thus do not fit well into the data parallel paradigm. In this paper we present Opus, a language designed to fill this gap. The central concept of Opus is a mechanism called ShareD Abstractions (SDA). An SDA can be used as a computation server, i.e., a locus of computational activity, or as a data repository for sharing data between asynchronous tasks. SDAs can be internally data parallel, providing support for the integration of data and task parallelism as well as nested task parallelism. They can thus be used to express multidisciplinary applications in a natural and efficient way. In this paper we describe the features of the language through a series of examples and give an overview of the runtime support required to implement these concepts in parallel and distributed environments.
A Goniometry Paradigm Shift to Measure Burn Scar Contracture in Burn Patients

DTIC Science & Technology

2016-10-01

burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson...spending will increase in parallel with on-site study training and remuneration for submitted data. Significant changes in use or care of human
The Parallel Episodic Processing (PEP) model: dissociating contingency and conflict adaptation in the item-specific proportion congruent paradigm.

PubMed

Schmidt, James R

2013-01-01

The present work introduces a computational model, the Parallel Episodic Processing (PEP) model, which demonstrates that contingency learning achieved via simple storage and retrieval of episodic memories can explain the item-specific proportion congruency effect in the colour-word Stroop paradigm. The current work also presents a new experimental procedure to more directly dissociate contingency biases from conflict adaptation (i.e., proportion congruency). This was done with three different types of incongruent words that allow a comparison of: (a) high versus low contingency while keeping proportion congruency constant, and (b) high versus low proportion congruency while keeping contingency constant. Results demonstrated a significant contingency effect, but no effect of proportion congruence. It was further shown that the proportion congruency associated with the colour does not matter, either. Thus, the results quite directly demonstrate that ISPC effects are not due to conflict adaptation, but instead to contingency learning biases. Copyright © 2012 Elsevier B.V. All rights reserved.

Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Understanding paradigms used for nursing research.

PubMed

Weaver, Kathryn; Olson, Joanne K

2006-02-01

The aims of this paper are to add clarity to the discussion about paradigms for nursing research and to consider integrative strategies for the development of nursing knowledge. Paradigms are sets of beliefs and practices, shared by communities of researchers, which regulate inquiry within disciplines. The various paradigms are characterized by ontological, epistemological and methodological differences in their approaches to conceptualizing and conducting research, and in their contribution towards disciplinary knowledge construction. Researchers may consider these differences so vast that one paradigm is incommensurable with another. Alternatively, researchers may ignore these differences and either unknowingly combine paradigms inappropriately or neglect to conduct needed research. To accomplish the task of developing nursing knowledge for use in practice, there is a need for a critical, integrated understanding of the paradigms used for nursing inquiry. We describe the evolution and influence of positivist, postpositivist, interpretive and critical theory research paradigms. Using integrative review, we compare and contrast the paradigms in terms of their philosophical underpinnings and scientific contribution. A pragmatic approach to theory development through synthesis of cumulative knowledge relevant to nursing practice is suggested. This requires that inquiry start with assessment of existing knowledge from disparate studies to identify key substantive content and gaps. Knowledge development in under-researched areas could be accomplished through integrative strategies that preserve theoretical integrity and strengthen research approaches associated with various philosophical perspectives. These strategies may include parallel studies within the same substantive domain using different paradigms; theoretical triangulation to combine findings from paradigmatically diverse studies; integrative reviews; and mixed method studies. Nurse scholars are urged to consider the benefits and limitations of inquiry within each paradigm, and the theoretical needs of the discipline.
Interest of new communicating material paradigm: An attempt in wood industry

NASA Astrophysics Data System (ADS)

Jover, J.; Thomas, A.; Leban, J. M.; Canet, D.

2013-03-01

This paper present a new paradigm in which the wood material could become communicating. We decide to use Nuclear Quadrupole Resonance to mass marking the wood. We imagine a new method to create identification codes. At first, we examine the feasibility of this mass marking method by impregnating wood to obtain a specific marking signal. In parallel, we study the interest to abide information provided by this marker to control the supply chain. We model the supply chain (e.g. the information/decisional flow) to understand which information is important and how to use it.
Psyche/Logos: Mapping the Terrains of Mind and Rhetoric.

ERIC Educational Resources Information Center

Baumlin, James S.; Baumlin, Tita French

1989-01-01

Discusses rhetoric as mirroring psychology. Examines Aristotle's three "pisteis"--the pathetic, logical, and ethical proofs, paralleling them to Freud's id, ego, and super-ego. Explores an adequate feminine psychology and a corresponding rhetoric. Outlines two models of persuasive discourse, the rational world paradigm and the narrative…
On the 'principle of the quantumness', the quantumness of Relativity, and the computational grand-unification

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Ariano, Giacomo Mauro

2010-05-04

I will argue that the proposal of establishing operational foundations of Quantum Theory should have top-priority, and that the Lucien Hardy's program on Quantum Gravity should be paralleled by an analogous program on Quantum Field Theory (QFT), which needs to be reformulated, notwithstanding its experimental success. In this paper, after reviewing recently suggested operational 'principles of the quantumness', I address the problem on whether Quantum Theory and Special Relativity are unrelated theories, or instead, if the one implies the other. I show how Special Relativity can be indeed derived from causality of Quantum Theory, within the computational paradigm 'the universemore » is a huge quantum computer', reformulating QFT as a Quantum-Computational Field Theory (QCFT). In QCFT Special Relativity emerges from the fabric of the computational network, which also naturally embeds gauge invariance. In this scheme even the quantization rule and the Planck constant can in principle be derived as emergent from the underlying causal tapestry of space-time. In this way Quantum Theory remains the only theory operating the huge computer of the universe.Is the computational paradigm only a speculative tautology (theory as simulation of reality), or does it have a scientific value? The answer will come from Occam's razor, depending on the mathematical simplicity of QCFT. Here I will just start scratching the surface of QCFT, analyzing simple field theories, including Dirac's. The number of problems and unmotivated recipes that plague QFT strongly motivates us to undertake the QCFT project, since QCFT makes all such problems manifest, and forces a re-foundation of QFT.« less
Understanding the Cray X1 System

NASA Technical Reports Server (NTRS)

Cheung, Samson

2004-01-01

This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

PubMed Central

2011-01-01

Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

PubMed

Cieślik, Marcin; Mura, Cameron

2011-02-25

Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
A Survey of Current Trends in Master's Programs in Microelectronics

ERIC Educational Resources Information Center

Bozanic, Mladen; Sinha, Saurabh

2018-01-01

Contribution: This paper brings forward a paradigm shift in microelectronic and nanoelectronic engineering education. Background: An increasing number of universities are offering graduate-level electrical engineering degree programs with multi-disciplinary Master's-level specialization in microelectronics or nanoelectronics. The paradigm shift…
The New Rule Paradigm Shift: Transforming At-Risk Programs by Matching Business Archetypes Strategies in the Global Market

ERIC Educational Resources Information Center

Stark, Paul S.

2007-01-01

The challenge was given to transform aviation-related programs to keep them from being eliminated. These programs were to be discontinued due to enrollment declines, costs, legislative mandates, lack of administrative support, and drastic state budget reductions. The New Rule was a paradigm shift of focus to the global market for program…
Porting Gravitational Wave Signal Extraction to Parallel Virtual Machine (PVM)

NASA Technical Reports Server (NTRS)

Thirumalainambi, Rajkumar; Thompson, David E.; Redmon, Jeffery

2009-01-01

Laser Interferometer Space Antenna (LISA) is a planned NASA-ESA mission to be launched around 2012. The Gravitational Wave detection is fundamentally the determination of frequency, source parameters, and waveform amplitude derived in a specific order from the interferometric time-series of the rotating LISA spacecrafts. The LISA Science Team has developed a Mock LISA Data Challenge intended to promote the testing of complicated nested search algorithms to detect the 100-1 millihertz frequency signals at amplitudes of 10E-21. However, it has become clear that, sequential search of the parameters is very time consuming and ultra-sensitive; hence, a new strategy has been developed. Parallelization of existing sequential search algorithms of Gravitational Wave signal identification consists of decomposing sequential search loops, beginning with outermost loops and working inward. In this process, the main challenge is to detect interdependencies among loops and partitioning the loops so as to preserve concurrency. Existing parallel programs are based upon either shared memory or distributed memory paradigms. In PVM, master and node programs are used to execute parallelization and process spawning. The PVM can handle process management and process addressing schemes using a virtual machine configuration. The task scheduling and the messaging and signaling can be implemented efficiently for the LISA Gravitational Wave search process using a master and 6 nodes. This approach is accomplished using a server that is available at NASA Ames Research Center, and has been dedicated to the LISA Data Challenge Competition. Historically, gravitational wave and source identification parameters have taken around 7 days in this dedicated single thread Linux based server. Using PVM approach, the parameter extraction problem can be reduced to within a day. The low frequency computation and a proxy signal-to-noise ratio are calculated in separate nodes that are controlled by the master using message and vector of data passing. The message passing among nodes follows a pattern of synchronous and asynchronous send-and-receive protocols. The communication model and the message buffers are allocated dynamically to address rapid search of gravitational wave source information in the Mock LISA data sets.
Natural Capital Management: An Evolutionary Paradigm for Sustainable Restoration Investment - 13455

DOE Office of Scientific and Technical Information (OSTI.GOV)

Koetz, Maureen T.

2013-07-01

Unlike other forms of capital assets (built infrastructure, labor, financial capital), the supply of usable or accessible air, land, and water elements (termed Natural Capital Assets or NCA) available to enterprise processes is structurally shrinking due to increased demand and regulatory restriction. This supply/demand imbalance is affecting all forms of public and private enterprise (including Federal Facilities) in the form of encroachment, production limits, cost increases, and reduced competitiveness. Department of Energy (DOE) sites are comprised of significant stocks of NCA that function as both conserved capital (providing ecosystem services and other reserve capacity), and as natural infrastructure (supporting majormore » Federal enterprise programs). The current rubric of 'Environmental Stewardship' provides an unduly constrained management paradigm that is focused largely on compliance process metrics, and lacks a value platform for quantifying, documenting, and sustainably re-deploying re-capitalized natural asset capacity and capability. By adopting value-based system concepts similar to built infrastructure accounting and information management, 'stewarded' natural assets relegated to liability- or compliance-focused outcomes become 're-capitalized' operational assets able to support new or expanded mission. This growing need for new accounting and management paradigms to capture natural capital value is achieving global recognition, most recently by the United Nations, world leaders, and international corporations at the Rio+20 Summit in June of 2012. Natural Capital Asset Management (NCAM){sup TM} is such an accounting framework tool. Using a quantification-based design, NCAM{sup TM} provides inventory, capacity and value data to owners or managers of natural assets such as the DOE that parallel comparable information systems currently used for facility assets. Applied to Environmental Management (EM) and other DOE program activities, the natural asset capacity and value generated by EM projects and other investment and operational programming can be recorded and then allocated to mission and/or ecosystem needs as part of overall site, complex, and Federal decision-making. NCAM{sup TM} can also document post-restoration asset capability and value for use in weighing loss mitigation and ecosystem damage claims arising from past operational activities. A prototype NCAM{sup TM} evaluation developed at the Savannah River Site (SRS) demonstrates use of this framework as an advanced paradigm for NCA accounting and decision-making for the larger DOE complex and other enterprise using natural capital in operations. Applying a quantified value paradigm, the framework catalogues the results of activities that sustain, restore, and modernize natural assets for enterprise-wide value beyond that of compliance milestones. Capturing and assigning recapitalization value using NCAM{sup TM} concepts and tools improves effective reuse of taxpayer-sustained assets, records ecosystem service value, enables mission and enterprise optimization, and assures the sustainability of shared natural capital assets in regional pools vital to both complex sites and local and regional economies. (authors)« less
DOES THE AUTECOLOGY OF THE MANGROVE RIVULUS FISH (RIVULUS MARMORATUS) REFLECT A PARADIGM FOR MANGROVE ECOSYSTEM SENSITIVITY?

EPA Science Inventory

The killifish Rivulus marmoratus, mangrove rivulus, represents the one of the two potentially truly "mangrove dependent" fish species in western Atlantic mangrove ecosystems. he distribution of this species closely parallels the range of red mangroves. hese plants and fish exhibi...
Control Yourself, or at Least Your Core Self

ERIC Educational Resources Information Center

Austin, Lisa M.

2010-01-01

Contemporary privacy debates regarding new technologies often define privacy in terms of control over personal information such that the privacy "problem" is a lack of control and the privacy "solution" is increased control. This article questions the control-paradigm by pointing to its parallels with earlier debates in the…
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.

PubMed

Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A

2016-06-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak

2003-01-01

This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
On extending parallelism to serial simulators

NASA Technical Reports Server (NTRS)

Nicol, David; Heidelberger, Philip

1994-01-01

This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
An interactive parallel programming environment applied in atmospheric science

NASA Technical Reports Server (NTRS)

vonLaszewski, G.

1996-01-01

This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.

Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Answer Set Programming and Other Computing Paradigms

ERIC Educational Resources Information Center

Meng, Yunsong

2013-01-01

Answer Set Programming (ASP) is one of the most prominent and successful knowledge representation paradigms. The success of ASP is due to its expressive non-monotonic modeling language and its efficient computational methods originating from building propositional satisfiability solvers. The wide adoption of ASP has motivated several extensions to…
A Paradigm Shift toward Evidence-Based Clinical Practice: Developing a Performance Assessment

ERIC Educational Resources Information Center

Wentworth, Nancy; Erickson, Lynnette B.; Lawrence, Barbara; Popham, J. Aaron; Korth, Byran

2009-01-01

The Clinical Practice Assessment System (CPAS), developed in response to teacher preparation program accreditation requirements, represents a paradigm shift of one university toward data-based decision-making in teacher education programs. The new assessment system is a scale aligned with INTASC Standards, which allows for observation and…
The implementation of an aeronautical CFD flow code onto distributed memory parallel systems

NASA Astrophysics Data System (ADS)

Ierotheou, C. S.; Forsey, C. R.; Leatham, M.

2000-04-01

The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright
Real-space processing of helical filaments in SPARX

PubMed Central

Behrmann, Elmar; Tao, Guozhi; Stokes, David L.; Egelman, Edward H.; Raunser, Stefan; Penczek, Pawel A.

2012-01-01

We present a major revision of the iterative helical real-space refinement (IHRSR) procedure and its implementation in the SPARX single particle image processing environment. We built on over a decade of experience with IHRSR helical structure determination and we took advantage of the flexible SPARX infrastructure to arrive at an implementation that offers ease of use, flexibility in designing helical structure determination strategy, and high computational efficiency. We introduced the 3D projection matching code which now is able to work with non-cubic volumes, the geometry better suited for long helical filaments, we enhanced procedures for establishing helical symmetry parameters, and we parallelized the code using distributed memory paradigm. Additional feature includes a graphical user interface that facilitates entering and editing of parameters controlling the structure determination strategy of the program. In addition, we present a novel approach to detect and evaluate structural heterogeneity due to conformer mixtures that takes advantage of helical structure redundancy. PMID:22248449
Terascale High-Fidelity Simulations of Turbulent Combustion with Detailed Chemistry: Spray Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rutland, Christopher J.

2009-04-26

The Terascale High-Fidelity Simulations of Turbulent Combustion (TSTC) project is a multi-university collaborative effort to develop a high-fidelity turbulent reacting flow simulation capability utilizing terascale, massively parallel computer technology. The main paradigm of the approach is direct numerical simulation (DNS) featuring the highest temporal and spatial accuracy, allowing quantitative observations of the fine-scale physics found in turbulent reacting flows as well as providing a useful tool for development of sub-models needed in device-level simulations. Under this component of the TSTC program the simulation code named S3D, developed and shared with coworkers at Sandia National Laboratories, has been enhanced with newmore » numerical algorithms and physical models to provide predictive capabilities for turbulent liquid fuel spray dynamics. Major accomplishments include improved fundamental understanding of mixing and auto-ignition in multi-phase turbulent reactant mixtures and turbulent fuel injection spray jets.« less
GPU-based Parallel Application Design for Emerging Mobile Devices

NASA Astrophysics Data System (ADS)

Gupta, Kshitij

A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as compute and communication capabilities of mobile devices improve, we analyze energy implications of processing speech recognition locally (on-chip) and offloading it to servers (in-cloud).
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

NASA Astrophysics Data System (ADS)

Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
The retailing of health care.

PubMed

Paul, T; Wong, J

1984-01-01

A number of striking parallels between recent developments in health care marketing and changes in the retailing industry exist. The authors have compared retailing paradigms to the area on health care marketing so strategists in hospitals and other health care institutions can gain insight from these parallels. Many of the same economic, demographic, technological and lifestyle forces may be at work in both the health care and retail markets. While the services or products offered in health care are radically different from those of conventional retail markets, the manner in which the products and services are positioned, priced or distributed is surprisingly similar.
ALEGRA -- A massively parallel h-adaptive code for solid dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Summers, R.M.; Wong, M.K.; Boucheron, E.A.

1997-12-31

ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
Atlas : A library for numerical weather prediction and climate modelling

NASA Astrophysics Data System (ADS)

Deconinck, Willem; Bauer, Peter; Diamantakis, Michail; Hamrud, Mats; Kühnlein, Christian; Maciel, Pedro; Mengaldo, Gianmarco; Quintino, Tiago; Raoult, Baudouin; Smolarkiewicz, Piotr K.; Wedi, Nils P.

2017-11-01

The algorithms underlying numerical weather prediction (NWP) and climate models that have been developed in the past few decades face an increasing challenge caused by the paradigm shift imposed by hardware vendors towards more energy-efficient devices. In order to provide a sustainable path to exascale High Performance Computing (HPC), applications become increasingly restricted by energy consumption. As a result, the emerging diverse and complex hardware solutions have a large impact on the programming models traditionally used in NWP software, triggering a rethink of design choices for future massively parallel software frameworks. In this paper, we present Atlas, a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. The Atlas library thus constitutes a step towards affordable exascale high-performance simulations by providing the necessary abstractions that facilitate the application in heterogeneous HPC environments by promoting the co-design of NWP algorithms with the underlying hardware.
I/O-Efficient Scientific Computation Using TPIE

NASA Technical Reports Server (NTRS)

Vengroff, Darren Erik; Vitter, Jeffrey Scott

1996-01-01

In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
Formal verification of mathematical software

NASA Technical Reports Server (NTRS)

Sutherland, D.

1984-01-01

Methods are investigated for formally specifying and verifying the correctness of mathematical software (software which uses floating point numbers and arithmetic). Previous work in the field was reviewed. A new model of floating point arithmetic called the asymptotic paradigm was developed and formalized. Two different conceptual approaches to program verification, the classical Verification Condition approach and the more recently developed Programming Logic approach, were adapted to use the asymptotic paradigm. These approaches were then used to verify several programs; the programs chosen were simplified versions of actual mathematical software.
The BLAZE language: A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Applied Biomechanics in an Instructional Setting

ERIC Educational Resources Information Center

Hudson, Jackie L.

2006-01-01

Biomechanics is the science of how people move better, meaning more skillfully and more safely. This article places more emphasis on skill rather than safety, though there are many parallels between them. It shares a few features of the author's paradigm of applied biomechanics and discusses an integrated approach toward a middle school football…
Work at the Uddevalla Volvo Plant from the Perspective of the Demand-Control Model

ERIC Educational Resources Information Center

Lottridge, Danielle

2004-01-01

The Uddevalla Volvo plant represents a different paradigm for automotive assembly. In parallel-flow work, self-managed work groups assemble entire automobiles with comparable productivity as conventional series-flow assembly lines. From the perspective of the demand-control model, operators at the Uddevalla plant have low physical and timing…
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

[Assistance to the climacteric woman: new paradigms].

PubMed

Lorenzi, Dino Roberto Soares De; Catan, Lenita Binelli; Moreira, Karen; Artico, Graziela Rech

2009-01-01

Population aging is a demographic reality for Brazil. Consequently, in the next years it is expected a progressive increase in seeking health care services in the country by women with complaints related to climacterium. Parallel to it, assistance at this part of woman's life has been going through a paradigm shift which has imposed to health professionals a change of attitude in relation to this stage of woman's life. Today it is acknowledged that the climacterium is influenced by biological, psychosocial and cultural factors, whose knowledge is fundamental for planning a more qualified and humanized care. This article proposes a reflection on the paradigm shifts in assistance at climacterium, highlighting important aspects as multidisciplinarity and interdisciplinarity, so as to serve better this portion of population, and provide it with more integrated and individualized care, bringing together knowledge and sensitivity, and always aiming at a better quality of life.
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
Melodic multi-feature paradigm reveals auditory profiles in music-sound encoding.

PubMed

Tervaniemi, Mari; Huotilainen, Minna; Brattico, Elvira

2014-01-01

Musical expertise modulates preattentive neural sound discrimination. However, this evidence up to great extent originates from paradigms using very simple stimulation. Here we use a novel melody paradigm (revealing the auditory profile for six sound parameters in parallel) to compare memory-related mismatch negativity (MMN) and attention-related P3a responses recorded from non-musicians and Finnish Folk musicians. MMN emerged in both groups of participants for all sound changes (except for rhythmic changes in non-musicians). In Folk musicians, the MMN was enlarged for mistuned sounds when compared with non-musicians. This is taken to reflect their familiarity with pitch information which is in key position in Finnish folk music when compared with e.g., rhythmic information. The MMN was followed by P3a after timbre changes, rhythm changes, and melody transposition. The MMN and P3a topographies differentiated the groups for all sound changes. Thus, the melody paradigm offers a fast and cost-effective means for determining the auditory profile for music-sound encoding and also, importantly, for probing the effects of musical expertise on it.
STOP-IT: Windows executable software for the stop-signal paradigm.

PubMed

Verbruggen, Frederick; Logan, Gordon D; Stevens, Michaël A

2008-05-01

The stop-signal paradigm is a useful tool for the investigation of response inhibition. In this paradigm, subjects are instructed to respond as fast as possible to a stimulus unless a stop signal is presented after a variable delay. However, programming the stop-signal task is typically considered to be difficult. To overcome this issue, we present software called STOP-IT, for running the stop-signal task, as well as an additional analyzing program called ANALYZE-IT. The main advantage of both programs is that they are a precompiled executable, and for basic use there is no need for additional programming. STOP-IT and ANALYZE-IT are completely based on free software, are distributed under the GNU General Public License, and are available at the personal Web sites of the first two authors or at expsy.ugent.be/tscope/stop.html.
The BLAZE language - A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
Visualization assisted by parallel processing

NASA Astrophysics Data System (ADS)

Lange, B.; Rey, H.; Vasques, X.; Puech, W.; Rodriguez, N.

2011-01-01

This paper discusses the experimental results of our visualization model for data extracted from sensors. The objective of this paper is to find a computationally efficient method to produce a real time rendering visualization for a large amount of data. We develop visualization method to monitor temperature variance of a data center. Sensors are placed on three layers and do not cover all the room. We use particle paradigm to interpolate data sensors. Particles model the "space" of the room. In this work we use a partition of the particle set, using two mathematical methods: Delaunay triangulation and Voronoý cells. Avis and Bhattacharya present these two algorithms in. Particles provide information on the room temperature at different coordinates over time. To locate and update particles data we define a computational cost function. To solve this function in an efficient way, we use a client server paradigm. Server computes data and client display this data on different kind of hardware. This paper is organized as follows. The first part presents related algorithm used to visualize large flow of data. The second part presents different platforms and methods used, which was evaluated in order to determine the better solution for the task proposed. The benchmark use the computational cost of our algorithm that formed based on located particles compared to sensors and on update of particles value. The benchmark was done on a personal computer using CPU, multi core programming, GPU programming and hybrid GPU/CPU. GPU programming method is growing in the research field; this method allows getting a real time rendering instates of a precompute rendering. For improving our results, we compute our algorithm on a High Performance Computing (HPC), this benchmark was used to improve multi-core method. HPC is commonly used in data visualization (astronomy, physic, etc) for improving the rendering and getting real-time.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Writing IEPs for the Audience of Teachers, Parents, and Students: The Case for the Communicative Individual Education Program

ERIC Educational Resources Information Center

Lucido, Richard J.

2013-01-01

Our current paradigm of Individualized Education Program (IEP) construction, the creation of a legal document or contract between parents and schools, needs rethinking. This paradigm's unintended consequences have resulted in vast inefficiencies in special education and have detracted from the core purpose of the IEP document, the communication of…
On a model of three-dimensional bursting and its parallel implementation

NASA Astrophysics Data System (ADS)

Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

2008-04-01

A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1989-01-01

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
In Search of a Paradigm for Bilingual Education.

ERIC Educational Resources Information Center

Aquirre, Adalberto, Jr.

The success of a new paradigm depends upon the level of coherency and organization within its supporting community, and Thomas Kuhn's argument concerning the resistance to new paradigms and their power to change the existing order can be used in the context of bilingual education. Bilingual education programs have the potential to create a…
Methods for design and evaluation of parallel computating systems (The PISCES project)

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

1989-01-01

The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.
The Legacy of Science

NASA Technical Reports Server (NTRS)

Burke, J.

1985-01-01

The mechanisms of techno-scientific and philosophical change are examined and related to the nature and transformation of society and mankind himself. In parallel with the notion that the fundamental mechanism of change is the free juxtaposition of disparate phenomena, it is suggested that, with the tools that modern technology provides, we may be moving toward a no-paradigm culture.
Analog Processor To Solve Optimization Problems

NASA Technical Reports Server (NTRS)

Duong, Tuan A.; Eberhardt, Silvio P.; Thakoor, Anil P.

1993-01-01

Proposed analog processor solves "traveling-salesman" problem, considered paradigm of global-optimization problems involving routing or allocation of resources. Includes electronic neural network and auxiliary circuitry based partly on concepts described in "Neural-Network Processor Would Allocate Resources" (NPO-17781) and "Neural Network Solves 'Traveling-Salesman' Problem" (NPO-17807). Processor based on highly parallel computing solves problem in significantly less time.
A Novel Design of 4-Class BCI Using Two Binary Classifiers and Parallel Mental Tasks

PubMed Central

Geng, Tao; Gan, John Q.; Dyson, Matthew; Tsui, Chun SL; Sepulveda, Francisco

2008-01-01

A novel 4-class single-trial brain computer interface (BCI) based on two (rather than four or more) binary linear discriminant analysis (LDA) classifiers is proposed, which is called a “parallel BCI.” Unlike other BCIs where mental tasks are executed and classified in a serial way one after another, the parallel BCI uses properly designed parallel mental tasks that are executed on both sides of the subject body simultaneously, which is the main novelty of the BCI paradigm used in our experiments. Each of the two binary classifiers only classifies the mental tasks executed on one side of the subject body, and the results of the two binary classifiers are combined to give the result of the 4-class BCI. Data was recorded in experiments with both real movement and motor imagery in 3 able-bodied subjects. Artifacts were not detected or removed. Offline analysis has shown that, in some subjects, the parallel BCI can generate a higher accuracy than a conventional 4-class BCI, although both of them have used the same feature selection and classification algorithms. PMID:18584040
Sublattice parallel replica dynamics.

PubMed

Martínez, Enrique; Uberuaga, Blas P; Voter, Arthur F

2014-06-01

Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998)] by combining it with the synchronous sublattice approach of Shim and Amar [ and , Phys. Rev. B 71, 125432 (2005)], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.
Development and Application of a Parallel LCAO Cluster Method

NASA Astrophysics Data System (ADS)

Patton, David C.

1997-08-01

CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Computer-aided programming for message-passing system; Problems and a solution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, M.Y.; Gajski, D.D.

1989-12-01

As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.

Efficacy and safety of sacubitril/valsartan (LCZ696) in Japanese patients with chronic heart failure and reduced ejection fraction: Rationale for and design of the randomized, double-blind PARALLEL-HF study.

PubMed

Tsutsui, Hiroyuki; Momomura, Shinichi; Saito, Yoshihiko; Ito, Hiroshi; Yamamoto, Kazuhiro; Ohishi, Tomomi; Okino, Naoko; Guo, Weinong

2017-09-01

The prognosis of heart failure patients with reduced ejection fraction (HFrEF) in Japan remains poor, although there is growing evidence for increasing use of evidence-based pharmacotherapies in Japanese real-world HF registries. Sacubitril/valsartan (LCZ696) is a first-in-class angiotensin receptor neprilysin inhibitor shown to reduce mortality and morbidity in the recently completed largest outcome trial in patients with HFrEF (PARADIGM-HF trial). The prospectively designed phase III PARALLEL-HF (Prospective comparison of ARNI with ACE inhibitor to determine the noveL beneficiaL trEatment vaLue in Japanese Heart Failure patients) study aims to assess the clinical efficacy and safety of LCZ696 in Japanese HFrEF patients, and show similar improvements in clinical outcomes as the PARADIGM-HF study enabling the registration of LCZ696 in Japan. This is a multicenter, randomized, double-blind, parallel-group, active controlled study of 220 Japanese HFrEF patients. Eligibility criteria include a diagnosis of chronic HF (New York Heart Association Class II-IV) and reduced ejection fraction (left ventricular ejection fraction ≤35%) and increased plasma concentrations of natriuretic peptides [N-terminal pro B-type natriuretic peptide (NT-proBNP) ≥600pg/mL, or NT-proBNP ≥400pg/mL for those who had a hospitalization for HF within the last 12 months] at the screening visit. The study consists of three phases: (i) screening, (ii) single-blind active LCZ696 run-in, and (iii) double-blind randomized treatment. Patients tolerating LCZ696 50mg bid during the treatment run-in are randomized (1:1) to receive LCZ696 100mg bid or enalapril 5mg bid for 4 weeks followed by up-titration to target doses of LCZ696 200mg bid or enalapril 10mg bid in a double-blind manner. The primary outcome is the composite of cardiovascular death or HF hospitalization and the study is an event-driven trial. The design of the PARALLEL-HF study is aligned with the PARADIGM-HF study and aims to assess the efficacy and safety of LCZ696 in Japanese HFrEF patients. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

PubMed Central

2014-01-01

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

PubMed

Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher

2014-01-01

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-26

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... technologies to participate in a program of parallel FDA-CMS review. The document was published with an...
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Symbolic Analysis of Concurrent Programs with Polymorphism

NASA Technical Reports Server (NTRS)

Rungta, Neha Shyam

2010-01-01

The current trend of multi-core and multi-processor computing is causing a paradigm shift from inherently sequential to highly concurrent and parallel applications. Certain thread interleavings, data input values, or combinations of both often cause errors in the system. Systematic verification techniques such as explicit state model checking and symbolic execution are extensively used to detect errors in such systems [7, 9]. Explicit state model checking enumerates possible thread schedules and input data values of a program in order to check for errors [3, 9]. To partially mitigate the state space explosion from data input values, symbolic execution techniques substitute data input values with symbolic values [5, 7, 6]. Explicit state model checking and symbolic execution techniques used in conjunction with exhaustive search techniques such as depth-first search are unable to detect errors in medium to large-sized concurrent programs because the number of behaviors caused by data and thread non-determinism is extremely large. We present an overview of abstraction-guided symbolic execution for concurrent programs that detects errors manifested by a combination of thread schedules and data values [8]. The technique generates a set of key program locations relevant in testing the reachability of the target locations. The symbolic execution is then guided along these locations in an attempt to generate a feasible execution path to the error state. This allows the execution to focus in parts of the behavior space more likely to contain an error.
Using CLIPS in the domain of knowledge-based massively parallel programming

NASA Technical Reports Server (NTRS)

Dvorak, Jiri J.

1994-01-01

The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Programming parallel architectures: The BLAZE family of languages

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
A scheme for solving the plane-plane challenge in force measurements at the nanoscale.

PubMed

Siria, Alessandro; Huant, Serge; Auvert, Geoffroy; Comin, Fabio; Chevrier, Joel

2010-05-19

Non-contact interaction between two parallel flat surfaces is a central paradigm in sciences. This situation is the starting point for a wealth of different models: the capacitor description in electrostatics, hydrodynamic flow, thermal exchange, the Casimir force, direct contact study, third body confinement such as liquids or films of soft condensed matter. The control of parallelism is so demanding that no versatile single force machine in this geometry has been proposed so far. Using a combination of nanopositioning based on inertial motors, of microcrystal shaping with a focused-ion beam (FIB) and of accurate in situ and real-time control of surface parallelism with X-ray diffraction, we propose here a "gedanken" surface-force machine that should enable one to measure interactions between movable surfaces separated by gaps in the micrometer and nanometer ranges.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Photochromic molecular implementations of universal computation.

PubMed

Chaplin, Jack C; Krasnogor, Natalio; Russell, Noah A

2014-12-01

Unconventional computing is an area of research in which novel materials and paradigms are utilised to implement computation. Previously we have demonstrated how registers, logic gates and logic circuits can be implemented, unconventionally, with a biocompatible molecular switch, NitroBIPS, embedded in a polymer matrix. NitroBIPS and related molecules have been shown elsewhere to be capable of modifying many biological processes in a manner that is dependent on its molecular form. Thus, one possible application of this type of unconventional computing is to embed computational processes into biological systems. Here we expand on our earlier proof-of-principle work and demonstrate that universal computation can be implemented using NitroBIPS. We have previously shown that spatially localised computational elements, including registers and logic gates, can be produced. We explain how parallel registers can be implemented, then demonstrate an application of parallel registers in the form of Turing machine tapes, and demonstrate both parallel registers and logic circuits in the form of elementary cellular automata. The Turing machines and elementary cellular automata utilise the same samples and same hardware to implement their registers, logic gates and logic circuits; and both represent examples of universal computing paradigms. This shows that homogenous photochromic computational devices can be dynamically repurposed without invasive reconfiguration. The result represents an important, necessary step towards demonstrating the general feasibility of interfacial computation embedded in biological systems or other unconventional materials and environments. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Health information technology: help or hindrance?

PubMed

Ketchersid, Terry

2014-07-01

The practice of medicine in general and nephrology in particular grows increasingly complex with each passing year. In parallel with this trend, the purchasers of health care are slowly shifting the reimbursement paradigm from one based on rewarding transactions, or work performed, to one that rewards value delivered. Within this context, the health-care value equation is broadly defined as quality divided by costs. Health information technology has been widely recognized as 1 of the foundations for delivering better care at lower costs. As the largest purchaser of health care in the world, the Centers for Medicare and Medicaid Services has deployed a series of interrelated programs designed to spur the adoption and utilization of health information technology. This review examines our known collective experience in the practice of nephrology to date with several of these programs and attempts to answer the following question: Is health information technology helping or hindering the delivery of value to the nation's health-care system? Through this review, it was concluded overall that the effect of health information technology appears positive; however, it cannot be objectively determined because of the infancy of its utilization in the practice of medicine. Copyright © 2014 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
The Core Competencies and MFT Education: Practical Aspects of Transitioning to a Learning-Centered, Outcome-Based Pedagogy

ERIC Educational Resources Information Center

Gehart, Diane

2011-01-01

The MFT core competencies and latest COAMFTE accreditation standards usher in a new paradigm for MFT education. This transition necessitates not only measuring student mastery of competencies but also, more importantly, adopting a contemporary pedagogical model. This article provides an overview of the changes, a review of parallel trends in other…

Anti-Cheating Crusader Vexes Some Professors: Software Kingpin Says Using his Product Would Cure Plagiarism Blight

ERIC Educational Resources Information Center

Read, Brock

2008-01-01

A parallel between plagiarism and corporate crime raises eyebrows--and ire-- on campuses, but for John Barrie, the comparison is a perfectly natural one. In the 10 years since he founded iParadigms, which sells the antiplagiarism software Turnitin, he has argued--forcefully, and at times combatively--that academic plagiarism is growing, and that…
Serial and Parallel Processes in Working Memory after Practice

ERIC Educational Resources Information Center

Oberauer, Klaus; Bialkova, Svetlana

2011-01-01

Six young adults practiced for 36 sessions on a working-memory updating task in which 2 digits and 2 spatial positions were continuously updated. Participants either did 1 updating operation at a time, or attempted 1 numerical and 1 spatial operation at the same time. In contrast to previous research using the same paradigm with a single digit and…
A Theoretical Analysis of the Perceptual Span based on SWIFT Simulations of the n + 2 Boundary Paradigm

PubMed Central

Risse, Sarah; Hohenstein, Sven; Kliegl, Reinhold; Engbert, Ralf

2014-01-01

Eye-movement experiments suggest that the perceptual span during reading is larger than the fixated word, asymmetric around the fixation position, and shrinks in size contingent on the foveal processing load. We used the SWIFT model of eye-movement control during reading to test these hypotheses and their implications under the assumption of graded parallel processing of all words inside the perceptual span. Specifically, we simulated reading in the boundary paradigm and analysed the effects of denying the model to have valid preview of a parafoveal word n + 2 two words to the right of fixation. Optimizing the model parameters for the valid preview condition only, we obtained span parameters with remarkably realistic estimates conforming to the empirical findings on the size of the perceptual span. More importantly, the SWIFT model generated parafoveal processing up to word n + 2 without fitting the model to such preview effects. Our results suggest that asymmetry and dynamic modulation are plausible properties of the perceptual span in a parallel word-processing model such as SWIFT. Moreover, they seem to guide the flexible distribution of processing resources during reading between foveal and parafoveal words. PMID:24771996
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Experiences with hypercube operating system instrumentation

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Rudolph, David C.

1989-01-01

The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

PubMed

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (<100 ms) moved to the center only one saccade later. Our data on eye movement positions provide novel evidence of parallel programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
Kappa and Rater Accuracy: Paradigms and Parameters.

PubMed

Conger, Anthony J

2017-12-01

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa (κ). Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another (concordance), using both nonstochastic and stochastic category membership. Using a probability model to express category assignments in terms of rater accuracy and random error, it is shown that observed agreement (Po) depends only on rater accuracy and number of categories; however, expected agreement (Pe) and κ depend additionally on category frequencies. Moreover, category frequencies affect Pe and κ solely through the variance of the category proportions, regardless of the specific frequencies underlying the variance. Paradoxically, some judgment paradigms involving stochastic categories are shown to yield higher κ values than their nonstochastic counterparts. Using the stated probability model, assignments to categories were generated for 552 combinations of paradigms, rater and category parameters, category frequencies, and number of stimuli. Observed means and standard errors for Po, Pe, and κ were fully consistent with theory expectations. Guidelines for interpretation of rater accuracy and reliability are offered, along with a discussion of alternatives to the basic model.
Voluntary saccadic eye movements in humans studied with a double-cue paradigm.

PubMed

Sheliga, B M; Brown, V J; Miles, F A

2002-07-01

In the classic double-step paradigm, subjects are required to make a saccade to a visual target that is briefly presented at one location and then shifted to a new location before the subject has responded. The saccades in this situation are "reflexive" in that they are made in response to the appearance of the target itself. In the present experiments we adapted the double-step paradigm to study "voluntary" saccades. For this, several identical targets were always visible and subjects were given a cue to indicate that they should make a saccade to one of them. This cue was then changed to indicate another of the targets before the subject had responded: double-cue (DC) paradigm. The saccadic eye movements in our DC paradigm had many features in common with those in the double-step paradigm and we show that apparent differences can be attributed to the spatio-temporal arrangements of the cues/targets rather than to any intrinsic differences in the programming of these two kinds of eye movements. For example, a feature of our DC paradigm that is not seen in the usual double-step paradigm is that the second cue could cause transient delays of the initial saccade, and these delays still occurred when the second cue was reflexive--provided that it was at the fovea (as in our DC paradigm) and not in the periphery (as in the usual double-step paradigm). Thus, the critical factor for the delay was the retinal (foveal) location of the second cue/target--not whether it was cognitive or reflexive--and we argue that the second cue/target is here acting as a distractor. We conclude that the DC paradigm can be used to study the programming of voluntary saccades in the same way that the double-step paradigm can be used to study reflexive saccades.
A survey of parallel programming tools

NASA Technical Reports Server (NTRS)

Cheng, Doreen Y.

1991-01-01

This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

NASA Technical Reports Server (NTRS)

Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.
Revised Extended Grid Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martz, Roger L.

The Revised Eolus Grid Library (REGL) is a mesh-tracking library that was developed for use with the MCNP6TM computer code so that (radiation) particles can track on an unstructured mesh. The unstructured mesh is a finite element representation of any geometric solid model created with a state-of-the-art CAE/CAD tool. The mesh-tracking library is written using modern Fortran and programming standards; the library is Fortran 2003 compliant. The library was created with a defined application programmer interface (API) so that it could easily integrate with other particle tracking/transport codes. The library does not handle parallel processing via the message passing interfacemore » (mpi), but has been used successfully where the host code handles the mpi calls. The library is thread-safe and supports the OpenMP paradigm. As a library, all features are available through the API and overall a tight coupling between it and the host code is required. Features of the library are summarized with the following list: Can accommodate first and second order 4, 5, and 6-sided polyhedra; any combination of element types may appear in a single geometry model; parts may not contain tetrahedra mixed with other element types; pentahedra and hexahedra can be together in the same part; robust handling of overlaps and gaps; tracks element-to-element to produce path length results at the element level; finds element numbers for a given mesh location; finds intersection points on element faces for the particle tracks; produce a data file for post processing results analysis; reads Abaqus .inp input (ASCII) files to obtain information for the global mesh-model; supports parallel input processing via mpi; and support parallel particle transport by both mpi and OpenMP.« less
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

PubMed Central

Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.

2014-01-01

The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Space transportation alternatives for large space programs - The International Space University summer session - 1992

NASA Technical Reports Server (NTRS)

Palaszewski, Bryan

1993-01-01

The issues discussed in this paper are the result of a 10-week study by the Space Solar Power Program design project members and the Space Transportation Group at the International Space University (ISU) summer session of 1992 to investigate new paradigms in space propulsion and how those paradigms might reduce the costs for large space programs. The program plan was to place a series of power satellites in Earth orbit. Several designs were studied where many kW, MW or GW of power would be transmitted to Earth or to other spacecraft in orbit. During the summer session, a space solar power system was also detailed and analyzed. At ISU, the focus of the study was to foster and develop some of the new paradigms that may eliminate the barriers to low cost for space exploration and exploitation. Many international and technical aspects of a large multinational program were studied. Environmental safety, space construction and maintenance, legal and policy issues of frequency allocation, technology transfer and control and many other areas were addressed.
20th Space Simulation Conference: The Changing Testing Paradigm

NASA Technical Reports Server (NTRS)

Stecher, Joseph L., III (Compiler)

1998-01-01

The Institute of Environmental Sciences' Twentieth Space Simulation Conference, "The Changing Testing Paradigm" provided participants with a forum to acquire and exchange information on the state-of-the-art in space simulation, test technology, atomic oxygen, program/system testing, dynamics testing, contamination, and materials. The papers presented at this conference and the resulting discussions carried out the conference theme "The Changing Testing Paradigm."
20th Space Simulation Conference: The Changing Testing Paradigm

NASA Technical Reports Server (NTRS)

Stecher, Joseph L., III (Compiler)

1999-01-01

The Institute of Environmental Sciences and Technology's Twentieth Space Simulation Conference, "The Changing Testing Paradigm" provided participants with a forum to acquire and exchange information on the state-of-the-art in space simulation, test technology, atomic oxygen, program/system testing, dynamics testing, contamination, and materials. The papers presented at this conference and the resulting discussions carried out the conference theme "The Changing Testing Paradigm."
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.

PubMed

Stone, John E; Gohara, David; Shi, Guochun

2010-05-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.

Watch-and-Comment as an Approach to Collaboratively Annotate Points of Interest in Video and Interactive-TV Programs

NASA Astrophysics Data System (ADS)

Pimentel, Maria Da Graça C.; Cattelan, Renan G.; Melo, Erick L.; Freitas, Giliard B.; Teixeira, Cesar A.

In earlier work we proposed the Watch-and-Comment (WaC) paradigm as the seamless capture of multimodal comments made by one or more users while watching a video, resulting in the automatic generation of multimedia documents specifying annotated interactive videos. The aim is to allow services to be offered by applying document engineering techniques to the multimedia document generated automatically. The WaC paradigm was demonstrated with a WaCTool prototype application which supports multimodal annotation over video frames and segments, producing a corresponding interactive video. In this chapter, we extend the WaC paradigm to consider contexts in which several viewers may use their own mobile devices while watching and commenting on an interactive-TV program. We first review our previous work. Next, we discuss scenarios in which mobile users can collaborate via the WaC paradigm. We then present a new prototype application which allows users to employ their mobile devices to collaboratively annotate points of interest in video and interactive-TV programs. We also detail the current software infrastructure which supports our new prototype; the infrastructure extends the Ginga middleware for the Brazilian Digital TV with an implementation of the UPnP protocol - the aim is to provide the seamless integration of the users' mobile devices into the TV environment. As a result, the work reported in this chapter defines the WaC paradigm for the mobile-user as an approach to allow the collaborative annotation of the points of interest in video and interactive-TV programs.
Architectures for reasoning in parallel

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.

1989-01-01

The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Efficient partitioning and assignment on programs for multiprocessor execution

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1993-01-01

The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
A mechanism for efficient debugging of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, B.P.; Choi, J.D.

1988-01-01

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

PubMed Central

Stone, John E.; Gohara, David; Shi, Guochun

2010-01-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981
Genetic algorithms using SISAL parallel programming language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tejada, S.

1994-05-06

Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.
An Expert System for the Development of Efficient Parallel Code

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

2004-01-01

We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications

PubMed Central

2014-01-01

Background The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n3) and of O(n5) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. Conclusions In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one. PMID:25077818
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.

PubMed

D'Angelo, Gianni; Rampone, Salvatore

2014-01-01

The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one.
Effects of Bright Light Therapy of Sleep, Cognition, Brain Function, and Neurochemistry in Mild Traumatic Brain Injury

DTIC Science & Technology

2012-01-01

computerized stimulation paradigms for use during functional neuroimaging (i.e., MSIT). Accomplishments: • The following computer tasks were...and Stability Test. • Programming of all computerized functional MRI stimulation paradigms and assessment tasks using E-prime software was completed...Computer stimulation paradigms were tested in the scanner environment to ensure that they could be presented and seen by subjects in the scanner
Solving Integer Programs from Dependence and Synchronization Problems

DTIC Science & Technology

1993-03-01

DEFF.NSNE Solving Integer Programs from Dependence and Synchronization Problems Jaspal Subhlok March 1993 CMU-CS-93-130 School of Computer ScienceT IC...method Is an exact and efficient way of solving integer programming problems arising in dependence and synchronization analysis of parallel programs...7/;- p Keywords: Exact dependence tesing, integer programming. parallelilzng compilers, parallel program analysis, synchronization analysis Solving
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
The war on obesity: a social determinant of health.

PubMed

O'Hara, Lily; Gregg, Jane

2006-12-01

The weight-centred health paradigm is an important contributor to the broader cultural paradigm in which corpulence is eschewed in favour of leanness. The desirability to reduce body fat or weight or to prevent gaining 'excess' fat is driven by both aesthetic and health ideals. The 'war on obesity' is a broad health-based set of policies and programs designed to problematise 'excess' body fat and create solutions to the 'problem'. There is a substantial body of literature that claims to demonstrate the harmful effects of 'excess' body fat. Recent critiques of 'obesity prevention' programs have highlighted the importance of focusing on environmental changes rather than individuals due in part to the risk of harmful consequences associated with individualistic, victim-blaming approaches. Beyond this, there are suggestions that framing body weight as the source of health problems - known as the weight-centred health paradigm - is in itself a harmful approach. The range of harms includes body dissatisfaction, dieting, disordered eating, discrimination and death. Health promotion policies and programs that operate within the weight-centred paradigm have the potential to have a negative impact on the health and well-being of individuals and communities.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

DOE PAGES

Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

2013-01-01

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
Distributed and parallel Ada and the Ada 9X recommendations

NASA Technical Reports Server (NTRS)

Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.

1992-01-01

Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
A proposed research program in information processing

NASA Technical Reports Server (NTRS)

Schorr, Herbert

1992-01-01

The goal of the Formalized Software Development (FSD) project was to demonstrate improvements productivity of software development and maintenance through the use of a new software lifecycle paradigm. The paradigm calls for the mechanical, but human-guided, derivation of software implementations from formal specifications of the desired software behavior. It relies on altering a system's specification and rederiving its implementation as the standard technology for software maintenance. A system definition for this paradigm is composed of a behavioral specification together with a body of annotations that control the derivation of executable code from the specification. Annotations generally achieve the selection of certain data representations and/or algorithms that are consistent with, but not mandated by, the behavioral specification. In doing this, they may yield systems which exhibit only certain behaviors among multiple alternatives permitted by the behavioral specification. The FSD project proposed to construct a testbed in which to explore the realization of this new paradigm. The testbed was to provide operational support environment for software design, implementation, and maintenance. The testbed was proposed to provide highly automated support for individual programmers ('programming in the small'), but not to address the additional needs of programming teams ('programming in the large'). The testbed proposed to focus on supporting rapid construction and evolution of useful prototypes of software systems, as opposed to focusing on the problems of achieving production quality performance of systems.
Aging of perennial cells and organ parts according to the programmed aging paradigm.

PubMed

Libertini, Giacinto; Ferrara, Nicola

2016-04-01

If aging is a physiological phenomenon-as maintained by the programmed aging paradigm-it must be caused by specific genetically determined and regulated mechanisms, which must be confirmed by evidence. Within the programmed aging paradigm, a complete proposal starts from the observation that cells, tissues, and organs show continuous turnover: As telomere shortening determines both limits to cell replication and a progressive impairment of cellular functions, a progressive decline in age-related fitness decline (i.e., aging) is a clear consequence. Against this hypothesis, a critic might argue that there are cells (most types of neurons) and organ parts (crystalline core and tooth enamel) that have no turnover and are subject to wear or manifest alterations similar to those of cells with turnover. In this review, it is shown how cell types without turnover appear to be strictly dependent on cells subjected to turnover. The loss or weakening of the functions fulfilled by these cells with turnover, due to telomere shortening and turnover slowing, compromises the vitality of the served cells without turnover. This determines well-known clinical manifestations, which in their early forms are described as distinct diseases (e.g., Alzheimer's disease, Parkinson's disease, age-related macular degeneration, etc.). Moreover, for the two organ parts (crystalline core and tooth enamel) without viable cells or any cell turnover, it is discussed how this is entirely compatible with the programmed aging paradigm.
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases

NASA Astrophysics Data System (ADS)

Brescia, Massimo

2012-11-01

Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.
Thermodynamic and Kinematic Flow Characteristics of Some Developing and Non-Developing Disturbances in Predict

DTIC Science & Technology

2014-12-01

normal ( 1S ) and parallel ( 2S ) strain rates squared. U and V are the zonal and meridional velocities and the x and y subscripts indicate partial...between developing and non-developing tropical disturbances appears to lie with the kinematic flow boundary structure and thermodynamic properties ...tropical disturbances appears to lie with the kinematic flow boundary structure and thermodynamic properties hypothesized in the marsupial paradigm
Combinatorial development of antibacterial Zr-Cu-Al-Ag thin film metallic glasses.

PubMed

Liu, Yanhui; Padmanabhan, Jagannath; Cheung, Bettina; Liu, Jingbei; Chen, Zheng; Scanley, B Ellen; Wesolowski, Donna; Pressley, Mariyah; Broadbridge, Christine C; Altman, Sidney; Schwarz, Udo D; Kyriakides, Themis R; Schroers, Jan

2016-05-27

Metallic alloys are normally composed of multiple constituent elements in order to achieve integration of a plurality of properties required in technological applications. However, conventional alloy development paradigm, by sequential trial-and-error approach, requires completely unrelated strategies to optimize compositions out of a vast phase space, making alloy development time consuming and labor intensive. Here, we challenge the conventional paradigm by proposing a combinatorial strategy that enables parallel screening of a multitude of alloys. Utilizing a typical metallic glass forming alloy system Zr-Cu-Al-Ag as an example, we demonstrate how glass formation and antibacterial activity, two unrelated properties, can be simultaneously characterized and the optimal composition can be efficiently identified. We found that in the Zr-Cu-Al-Ag alloy system fully glassy phase can be obtained in a wide compositional range by co-sputtering, and antibacterial activity is strongly dependent on alloy compositions. Our results indicate that antibacterial activity is sensitive to Cu and Ag while essentially remains unchanged within a wide range of Zr and Al. The proposed strategy not only facilitates development of high-performing alloys, but also provides a tool to unveil the composition dependence of properties in a highly parallel fashion, which helps the development of new materials by design.
Combinatorial development of antibacterial Zr-Cu-Al-Ag thin film metallic glasses

NASA Astrophysics Data System (ADS)

Liu, Yanhui; Padmanabhan, Jagannath; Cheung, Bettina; Liu, Jingbei; Chen, Zheng; Scanley, B. Ellen; Wesolowski, Donna; Pressley, Mariyah; Broadbridge, Christine C.; Altman, Sidney; Schwarz, Udo D.; Kyriakides, Themis R.; Schroers, Jan

2016-05-01

Metallic alloys are normally composed of multiple constituent elements in order to achieve integration of a plurality of properties required in technological applications. However, conventional alloy development paradigm, by sequential trial-and-error approach, requires completely unrelated strategies to optimize compositions out of a vast phase space, making alloy development time consuming and labor intensive. Here, we challenge the conventional paradigm by proposing a combinatorial strategy that enables parallel screening of a multitude of alloys. Utilizing a typical metallic glass forming alloy system Zr-Cu-Al-Ag as an example, we demonstrate how glass formation and antibacterial activity, two unrelated properties, can be simultaneously characterized and the optimal composition can be efficiently identified. We found that in the Zr-Cu-Al-Ag alloy system fully glassy phase can be obtained in a wide compositional range by co-sputtering, and antibacterial activity is strongly dependent on alloy compositions. Our results indicate that antibacterial activity is sensitive to Cu and Ag while essentially remains unchanged within a wide range of Zr and Al. The proposed strategy not only facilitates development of high-performing alloys, but also provides a tool to unveil the composition dependence of properties in a highly parallel fashion, which helps the development of new materials by design.
Systems Factorial Technology provides new insights on global-local information processing in autism spectrum disorders.

PubMed

Johnson, Shannon A; Blaha, Leslie M; Houpt, Joseph W; Townsend, James T

2010-02-01

Previous studies of global-local processing in autism spectrum disorders (ASDs) have indicated mixed findings, with some evidence of a local processing bias, or preference for detail-level information, and other results suggesting typical global advantage, or preference for the whole or gestalt. Findings resulting from this paradigm have been used to argue for or against a detail focused processing bias in ASDs, and thus have important theoretical implications. We applied Systems Factorial Technology, and the associated Double Factorial Paradigm (both defined in the text), to examine information processing characteristics during a divided attention global-local task in high-functioning individuals with an ASD and typically developing controls. Group data revealed global advantage for both groups, contrary to some current theories of ASDs. Information processing models applied to each participant revealed that task performance, although showing no differences at the group level, was supported by different cognitive mechanisms in ASD participants compared to controls. All control participants demonstrated inhibitory parallel processing and the majority demonstrated a minimum-time stopping rule. In contrast, ASD participants showed exhaustive parallel processing with mild facilitatory interactions between global and local information. Thus our results indicate fundamental differences in the stopping rules and channel dependencies in individuals with an ASD.
Combinatorial development of antibacterial Zr-Cu-Al-Ag thin film metallic glasses

PubMed Central

Liu, Yanhui; Padmanabhan, Jagannath; Cheung, Bettina; Liu, Jingbei; Chen, Zheng; Scanley, B. Ellen; Wesolowski, Donna; Pressley, Mariyah; Broadbridge, Christine C.; Altman, Sidney; Schwarz, Udo D.; Kyriakides, Themis R.; Schroers, Jan

2016-01-01

Metallic alloys are normally composed of multiple constituent elements in order to achieve integration of a plurality of properties required in technological applications. However, conventional alloy development paradigm, by sequential trial-and-error approach, requires completely unrelated strategies to optimize compositions out of a vast phase space, making alloy development time consuming and labor intensive. Here, we challenge the conventional paradigm by proposing a combinatorial strategy that enables parallel screening of a multitude of alloys. Utilizing a typical metallic glass forming alloy system Zr-Cu-Al-Ag as an example, we demonstrate how glass formation and antibacterial activity, two unrelated properties, can be simultaneously characterized and the optimal composition can be efficiently identified. We found that in the Zr-Cu-Al-Ag alloy system fully glassy phase can be obtained in a wide compositional range by co-sputtering, and antibacterial activity is strongly dependent on alloy compositions. Our results indicate that antibacterial activity is sensitive to Cu and Ag while essentially remains unchanged within a wide range of Zr and Al. The proposed strategy not only facilitates development of high-performing alloys, but also provides a tool to unveil the composition dependence of properties in a highly parallel fashion, which helps the development of new materials by design. PMID:27230692
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
WFIRST: Science from the Guest Investigator and Parallel Observation Programs

NASA Astrophysics Data System (ADS)

Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

2018-01-01

The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
Parallel Three-Dimensional Computation of Fluid Dynamics and Fluid-Structure Interactions of Ram-Air Parachutes

NASA Technical Reports Server (NTRS)

Tezduyar, Tayfun E.

1998-01-01

This is a final report as far as our work at University of Minnesota is concerned. The report describes our research progress and accomplishments in development of high performance computing methods and tools for 3D finite element computation of aerodynamic characteristics and fluid-structure interactions (FSI) arising in airdrop systems, namely ram-air parachutes and round parachutes. This class of simulations involves complex geometries, flexible structural components, deforming fluid domains, and unsteady flow patterns. The key components of our simulation toolkit are a stabilized finite element flow solver, a nonlinear structural dynamics solver, an automatic mesh moving scheme, and an interface between the fluid and structural solvers; all of these have been developed within a parallel message-passing paradigm.
Programming Probabilistic Structural Analysis for Parallel Processing Computer

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

1991-01-01

The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Hadoop neural network for parallel and distributed feature selection.

PubMed

Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

2016-06-01

In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Global Health Diplomacy, "San Francisco Values," and HIV/AIDS: From the Local to the Global.

PubMed

Kevany, Sebastian

2015-01-01

San Francisco has a distinguished history as a cosmopolitan, progressive, and international city, including extensive associations with global health. These circumstances have contributed to new, interdisciplinary scholarship in the field of global health diplomacy (GHD). In the present review, we describe the evolution and history of GHD at the practical and theoretical levels within the San Francisco medical community, trace related associations between the local and the global, and propose a range of potential opportunities for further development of this dynamic field. We provide a historical overview of the development of the "San Francisco Model" of collaborative, community-owned HIV/AIDS treatment and care programs as pioneered under the "Ward 86" paradigm of the 1980s. We traced the expansion and evolution of this model to the national level under the Ryan White Care Act, and internationally via the President's Emergency Plan for AIDS Relief. In parallel, we describe the evolution of global health diplomacy practices, from the local to the global, including the integration of GHD principles into intervention design to ensure social, political, and cultural acceptability and sensitivity. Global health programs, as informed by lessons learned from the San Francisco Model, are increasingly aligned with diplomatic principles and practices. This awareness has aided implementation, allowed policymakers to pursue related and progressive social and humanitarian issues in conjunction with medical responses, and elevated global health to the realm of "high politics." In the 21st century, the integration between diplomatic, medical, and global health practices will continue under "smart global health" and GHD paradigms. These approaches will enhance intervention cost-effectiveness by addressing and optimizing, in tandem with each other, a wide range of (health and non-health) foreign policy, diplomatic, security, and economic priorities in a synergistic manner--without sacrificing health outcomes. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.
Cognitive remediation versus active computer control in bipolar disorder with psychosis: study protocol for a randomized controlled trial.

PubMed

Lewandowski, Kathryn Eve; Sperry, Sarah H; Ongur, Dost; Cohen, Bruce M; Norris, Lesley A; Keshavan, Matcheri S

2016-03-12

Cognitive dysfunction is a major feature of bipolar disorder with psychosis and is strongly associated with functional outcomes. Computer-based cognitive remediation has shown promise in improving cognition in patients with schizophrenia. However, despite similar neurocognitive deficits between patients with schizophrenia and bipolar disorder, few studies have extended neuroscience-based cognitive remediation programs to this population. The Treatment to Enhance Cognition in Bipolar Disorder study is an investigator-initiated, parallel group, randomized, blinded clinical trial of an Internet-based cognitive remediation protocol for patients with bipolar disorder I with psychosis (n = 100). We also describe the development of our dose-matched active control paradigm. Both conditions involve 70 sessions of computer-based activities over 24 weeks. The control intervention was developed to mirror the treatment condition in dose and format but without the neuroplasticity-based task design and structure. All participants undergo neuropsychological and clinical assessment at baseline, after approximately 25 hours of study activities, post treatment, and after 6 months of no study contact to assess durability. Neuroimaging at baseline and post treatment are offered in an "opt-in" format. The primary outcomes are scores on the MATRICS battery; secondary and exploratory outcomes include measures of clinical symptoms, community functioning, and neuroimaging changes. Associations between change in cognitive measures and change in community functioning will be assessed. Baseline predictors of treatment response will be examined. The present study is the first we are aware of to implement an Internet-based cognitive remediation program in patients with bipolar disorder with psychosis and to develop a comparable web-based control paradigm. The mixed online and study-site format allows accessible treatment while providing weekly staff contact and bridging. Based on user-provided feedback, participant blinding is feasible. ClinicalTrials.gov NCT01470781 ; 11 July 2011.

Performance Implications of Synchronization Support for Parallel FORTRAN Programs

DTIC Science & Technology

1991-06-17

applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3
Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Selective, Embedded, Just-In-Time Specialization (SEJITS): Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages

DTIC Science & Technology

2012-12-01

identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency
Enabling GEODSS for Space Situational Awareness (SSA)

NASA Astrophysics Data System (ADS)

Wootton, S.

2016-09-01

The Ground-Based Electro-Optical Deep Space Surveillance (GEODSS) System has been in operation since the mid-1980's. While GEODSS has been the Space Surveillance Network's (SSN's) workhorse in terms of deep space surveillance, it has not undergone a significant modernization since the 1990's. This means GEODSS continues to operate under a mostly obsolete, legacy data processing baseline. The System Program Office (SPO) responsible for GEODSS, SMC/SYGO, has a number of advanced Space Situational Awareness (SSA)-related efforts in progress, in the form of innovative optical capabilities, data processing algorithms, and hardware upgrades. Each of these efforts is in various stages of evaluation and acquisition. These advanced capabilities rely upon a modern computing environment in which to integrate, but GEODSS does not have one—yet. The SPO is also executing a Service Life Extension Program (SLEP) to modernize the various subsystems within GEODSS, along with a parallel effort to implement a complete, modern software re-architecture. The goal is to use a modern, service-based architecture to provide expedient integration as well as easier and more sustainable expansion. This presentation will describe these modernization efforts in more detail and discuss how adopting such modern paradigms and practices will help ensure the GEODSS system remains relevant and sustainable far beyond 2027.
Optimizing delivery of recovery-oriented online self-management strategies for bipolar disorder: a review.

PubMed

Leitan, Nuwan D; Michalak, Erin E; Berk, Lesley; Berk, Michael; Murray, Greg

2015-03-01

Self-management is emerging as a viable alternative to difficult-to-access psychosocial treatments for bipolar disorder (BD), and has particular relevance to recovery-related goals around empowerment and personal meaning. This review examines data and theory on BD self-management from a recovery-oriented perspective, with a particular focus on optimizing low-intensity delivery of self-management tools via the web. A critical evaluation of various literatures was undertaken. Literatures on recovery, online platforms, and self-management in mental health and BD are reviewed. The literature suggests that the self-management approach aligns with the recovery framework. However, studies have identified a number of potential barriers to the utilization of self-management programs for BD and it has been suggested that utilizing an online environment may be an effective way to surmount many of these barriers. Online self-management programs for BD are rapidly developing, and in parallel the recovery perspective is becoming the dominant paradigm for mental health services worldwide, so research is urgently required to assess the efficacy and safety of optimization methods such as professional and/or peer support, tailoring and the development of 'online communities'. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The cosmic matrix in the 50th anniversary of relativistic astrophysics

NASA Astrophysics Data System (ADS)

Ruffini, R.; Aimuratov, Y.; Becerra, L.; Bianco, C. L.; Karlica, M.; Kovacevic, M.; Melon Fuksman, J. D.; Moradi, R.; Muccino, M.; Penacchioni, A. V.; Pisani, G. B.; Primorac, D.; Rueda, J. A.; Shakeri, S.; Vereshchagin, G. V.; Wang, Y.; Xue, S.-S.

Our concept of induced gravitational collapse (IGC paradigm) starting from a supernova occurring with a companion neutron star, has unlocked the understanding of seven different families of gamma ray bursts (GRBs), indicating a path for the formation of black holes in the universe. An authentic laboratory of relativistic astrophysics has been unveiled in which new paradigms have been introduced in order to advance knowledge of the most energetic, distant and complex systems in our universe. A novel cosmic matrix paradigm has been introduced at a relativistic cosmic level, which parallels the concept of an S-matrix introduced by Feynmann, Wheeler and Heisenberg in the quantum world of microphysics. Here the “in” states are represented by a neutron star and a supernova, while the “out” states, generated within less than a second, are a new neutron star and a black hole. This novel field of research needs very powerful technological observations in all wavelengths ranging from radio through optical, X-ray and gamma ray radiation all the way up to ultra-high-energy cosmic rays.
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

PubMed

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
A New Paradigm for Ovarian Sex Cord-Stromal Tumor Development

DTIC Science & Technology

2016-05-01

1 AWARD NUMBER: W81XWH-15-1-0082 TITLE: A New Paradigm for Ovarian Sex Cord-Stromal Tumor Development PRINCIPAL INVESTIGATOR: Qinglei Li...5a. CONTRACT NUMBER A New Paradigm for Ovarian Sex Cord-Stromal Tumor Development 5b. GRANT NUMBER W81XWH-15-1-0082 5c. PROGRAM ELEMENT NUMBER 6...role of sustained activation of oocyte TGFBR1 in ovarian tumor development using an in vitro approach. 2. Keywords Ovarian tumor, Sex cord
Concepts of Concurrent Programming

DTIC Science & Technology

1990-04-01

to the material presented. Carriero89 Carriero, N., and Gelernter, D. " How to Write Parallel Programs : A Guide to the Perplexed." ACM...between the architectures on which programs can be executed and the application domains from which problems are drawn. Our goal is to show how programs ...Sept. 1989), 251-510. Abstract: There are four papers: 1. Programming Languages for Distributed Computing Systems (52); 2. How to Write Parallel
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei; Xu, Jingling

2006-01-01

This slide presentation reviews some of the issues around distributed parallel programming. It compares and contrast two methods of programming: Single Program Multiple Data (SPMD) with the Navigational Programming (NAVP). It then reviews the distributed sequential computing (DSC) method and the methodology of NavP. Case studies are presented. It also reviews the work that is being done to enable the NavP system.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Teaching International Business Abroad: Paradigms Suggested by Metaphor Theory

ERIC Educational Resources Information Center

Starr-Glass, David

2009-01-01

When International Business (IB) is taught abroad, the educational institution has to decide on organizational issues and educational and teaching paradigms. College and university programs abroad can adopt organizational values and identities similar to the home institution, or adapt to local operating environments. Likewise, educational and…
On program restructuring, scheduling, and communication for parallel processor systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polychronopoulos, Constantine D.

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less
Mendelian genetics: Paradigm, conjecture, or research program

NASA Astrophysics Data System (ADS)

Oldham, V.; Brouwer, W.

Kuhn's model of the structure of scientific revolutions, Popper's hypothetic-deductive model of science, and Lakatos's methodology of competing research programs are applied to a historical episode in biology. Each of these three models offers a different explanatory system for the development, neglect, and eventual acceptance of Mendel's paradigm of inheritance. The authors conclude that both rational and nonrational criteria play an important role during times of crisis in science, when different research programs compete for acceptance. It is suggested that Kuhn's model, emphasizing the nonrational basis of science, and Popper's model, emphasizing the rational basis of science, can be used fruitfully in high school science courses.
Modelling parallel programs and multiprocessor architectures with AXE

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.

1991-01-01

AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Web Based Parallel Programming Workshop for Undergraduate Education.

ERIC Educational Resources Information Center

Marcus, Robert L.; Robertson, Douglass

Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Special Feature: Epistemological Paradigms in Evaluation: Implications for Practice. Section 1: "Paradigm Contrast" Case Studies.

ERIC Educational Resources Information Center

Rose, Tom; And Others

1991-01-01

This section presents two pragmatic studies focusing on a real-life situation and specific applications in managerial decision making. One study deals with improving occupational safety in a bedding manufacturing plant, whereas the other concentrates on managing a state mental health program. (SLD)
A Paradigm for the Telephonic Assessment of Suicidal Ideation

ERIC Educational Resources Information Center

Halderman, Brent L.; Eyman, James R.; Kerner, Lisa; Schlacks, Bill

2009-01-01

A three-stage paradigm for telephonically assessing suicidal risk and triaging suicidal callers as practiced in an Employee Assistance Program Call Center was investigated. The first hypothesis was that the use of the procedure would increase the probability that callers would accept the clinician's recommendations, evidenced by fewer police…
Staff Development: A Gestalt Paradigm.

ERIC Educational Resources Information Center

Parsons, Michael H.

Hagerstown Junior College, Maryland, has had a staff development program for the past five years. The major components have been evaluated, revised, and integrated into a gestalt paradigm--a total institutional thrust designed to insure that the goals of the college meet the challenges presented by the service area. Each component exists to foster…

Instrumentation, performance visualization, and debugging tools for multiprocessors

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

1991-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Multi-Purpose, Application-Centric, Scalable I/O Proxy Application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M. C.

2015-06-15

MACSio is a Multi-purpose, Application-Centric, Scalable I/O proxy application. It is designed to support a number of goals with respect to parallel I/O performance testing and benchmarking including the ability to test and compare various I/O libraries and I/O paradigms, to predict scalable performance of real applications and to help identify where improvements in I/O performance can be made within the HPC I/O software stack.
Design of Arithmetic Circuits for Complex Binary Number System

NASA Astrophysics Data System (ADS)

Jamil, Tariq

2011-08-01

Complex numbers play important role in various engineering applications. To represent these numbers efficiently for storage and manipulation, a (-1+j)-base complex binary number system (CBNS) has been proposed in the literature. In this paper, designs of nibble-size arithmetic circuits (adder, subtractor, multiplier, divider) have been presented. These circuits can be incorporated within von Neumann and associative dataflow processors to achieve higher performance in both sequential and parallel computing paradigms.
Neural Basis of Empathy and Its Dysfunction in Autism Spectrum Disorders (ASD)

DTIC Science & Technology

2013-08-01

representation and social learning (Chang et al., 2013). Therefore, we tested the causality of ACCg neurons in addition to neurons in a series of prefrontal...paradigms, we in parallel trained new monkeys to perform observational social 5 learning tasks, which also depend on sensitivity to the rewarding...experiences of another individual. We tested the causal contribution of ACCg and insula to social learning by investigating whether observer monkeys
Lightweight and Statistical Techniques for Petascale Debugging: Correctness on Petascale Systems (CoPS) Preliminry Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

de Supinski, B R; Miller, B P; Liblit, B

2011-09-13

Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in a wide range of scientific disciplines. These large systems create unprecedented application development challenges. Scalable correctness tools are critical to shorten the time-to-solution on these systems. Currently, many DOE application developers use primitive manual debugging based on printf or traditional debuggers such as TotalView or DDT. This paradigm breaks down beyond a few thousand cores, yet bugs often arise above that scale. Programmers must reproduce problems in smaller runs to analyze them with traditional tools, or else perform repeated runs at scale using only primitive techniques.more » Even when traditional tools run at scale, the approach wastes substantial effort and computation cycles. Continued scientific progress demands new paradigms for debugging large-scale applications. The Correctness on Petascale Systems (CoPS) project is developing a revolutionary debugging scheme that will reduce the debugging problem to a scale that human developers can comprehend. The scheme can provide precise diagnoses of the root causes of failure, including suggestions of the location and the type of errors down to the level of code regions or even a single execution point. Our fundamentally new strategy combines and expands three relatively new complementary debugging approaches. The Stack Trace Analysis Tool (STAT), a 2011 R&D 100 Award Winner, identifies behavior equivalence classes in MPI jobs and highlights behavior when elements of the class demonstrate divergent behavior, often the first indicator of an error. The Cooperative Bug Isolation (CBI) project has developed statistical techniques for isolating programming errors in widely deployed code that we will adapt to large-scale parallel applications. Finally, we are developing a new approach to parallelizing expensive correctness analyses, such as analysis of memory usage in the Memgrind tool. In the first two years of the project, we have successfully extended STAT to determine the relative progress of different MPI processes. We have shown that the STAT, which is now included in the debugging tools distributed by Cray with their large-scale systems, substantially reduces the scale at which traditional debugging techniques are applied. We have extended CBI to large-scale systems and developed new compiler based analyses that reduce its instrumentation overhead. Our results demonstrate that CBI can identify the source of errors in large-scale applications. Finally, we have developed MPIecho, a new technique that will reduce the time required to perform key correctness analyses, such as the detection of writes to unallocated memory. Overall, our research results are the foundations for new debugging paradigms that will improve application scientist productivity by reducing the time to determine which package or module contains the root cause of a problem that arises at all scales of our high end systems. While we have made substantial progress in the first two years of CoPS research, significant work remains. While STAT provides scalable debugging assistance for incorrect application runs, we could apply its techniques to assertions in order to observe deviations from expected behavior. Further, we must continue to refine STAT's techniques to represent behavioral equivalence classes efficiently as we expect systems with millions of threads in the next year. We are exploring new CBI techniques that can assess the likelihood that execution deviations from past behavior are the source of erroneous execution. Finally, we must develop usable correctness analyses that apply the MPIecho parallelization strategy in order to locate coding errors. We expect to make substantial progress on these directions in the next year but anticipate that significant work will remain to provide usable, scalable debugging paradigms.« less
Complementary development of prevention and mental health promotion programs for Canadian children based on contemporary scientific paradigms.

PubMed

Breton, J J

1999-04-01

Confusion regarding definitions and standards of prevention and promotion programs is pervasive, as revealed by a review of such programs in Canada. This paper examines how a discussion of scientific paradigms can help clarify models of prevention and mental health promotion and proposes the complementary development of prevention and promotion programs. A paradigm shift in science contributed to the emergence of the transactional model, advocating multiple causes and dynamic transactions between the individual and the environment. Consequently, the view of prevention applying over a linear continuum and of single stressful events causing mental disorders may no longer be appropriate. It is the author's belief that the new science of chaos theory, which addresses processes involved in the development of systems, can be applied to child development and thus to the heart of prevention and promotion programs. Critical moments followed by transitions or near-chaotic behaviours lead to stable states better adapted to the environment. Prevention programs would focus on the critical moments and target groups at risk to reduce risk factors. Promotion programs would focus on stable states and target the general population to develop age-appropriate life skills. The concept of sensitive dependence on initial conditions and certain empirical studies suggest that the programs would have the greatest impact at the beginning of life. It is hoped that this effort to organize knowledge about conceptual models of prevention and mental health promotion programs will foster the development of these programs to meet the urgent needs of Canadian children.
Parallel computation with the force

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1985-01-01

A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Stabilization and Reconstruction Operations: A New Paradigm, Analysis Tool, and US Air Force Role

DTIC Science & Technology

2007-03-08

Maslow in his 1943 treatise, A Theory of Human Motivation, from which his Hierarchy of Needs Pyramid is derived.15 An interesting observation can be...or program budgets, but out of a necessity to retain relevancy in today’s conflict environment. It has taken over a decade to reach this point, but...than rigorous scientific explanation. The New Paradigm The new paradigm is not really new at all but is derived from noted psychologist Abraham
76 FR 62808 - Pilot Program for Parallel Review of Medical Products

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-11

... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...
Algorithms and programming tools for image processing on the MPP

NASA Technical Reports Server (NTRS)

Reeves, A. P.

1985-01-01

Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Program Correctness, Verification and Testing for Exascale (Corvette)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sen, Koushik; Iancu, Costin; Demmel, James W

The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

NASA Astrophysics Data System (ADS)

Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.

1995-03-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simunovic, S.; Zacharia, T.; Baltas, N.

1995-04-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Generating performance portable geoscientific simulation code with Firedrake (Invited)

NASA Astrophysics Data System (ADS)

Ham, D. A.; Bercea, G.; Cotter, C. J.; Kelly, P. H.; Loriant, N.; Luporini, F.; McRae, A. T.; Mitchell, L.; Rathgeber, F.

2013-12-01

This presentation will demonstrate how a change in simulation programming paradigm can be exploited to deliver sophisticated simulation capability which is far easier to programme than are conventional models, is capable of exploiting different emerging parallel hardware, and is tailored to the specific needs of geoscientific simulation. Geoscientific simulation represents a grand challenge computational task: many of the largest computers in the world are tasked with this field, and the requirements of resolution and complexity of scientists in this field are far from being sated. However, single thread performance has stalled, even sometimes decreased, over the last decade, and has been replaced by ever more parallel systems: both as conventional multicore CPUs and in the emerging world of accelerators. At the same time, the needs of scientists to couple ever-more complex dynamics and parametrisations into their models makes the model development task vastly more complex. The conventional approach of writing code in low level languages such as Fortran or C/C++ and then hand-coding parallelism for different platforms by adding library calls and directives forces the intermingling of the numerical code with its implementation. This results in an almost impossible set of skill requirements for developers, who must simultaneously be domain science experts, numericists, software engineers and parallelisation specialists. Even more critically, it requires code to be essentially rewritten for each emerging hardware platform. Since new platforms are emerging constantly, and since code owners do not usually control the procurement of the supercomputers on which they must run, this represents an unsustainable development load. The Firedrake system, conversely, offers the developer the opportunity to write PDE discretisations in the high-level mathematical language UFL from the FEniCS project (http://fenicsproject.org). Non-PDE model components, such as parametrisations, can be written as short C kernels operating locally on the underlying mesh, with no explicit parallelism. The executable code is then generated in C, CUDA or OpenCL and executed in parallel on the target architecture. The system also offers features of special relevance to the geosciences. In particular, the large scale separation between the vertical and horizontal directions in many geoscientific processes can be exploited to offer the flexibility of unstructured meshes in the horizontal direction, without the performance penalty usually associated with those methods.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less

Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Knowledge, programming, and programming cultures: LISP, C, and Ada

NASA Technical Reports Server (NTRS)

Rochowiak, Daniel

1990-01-01

The results of research 'Ada as an implementation language for knowledge based systems' are presented. The purpose of the research was to compare Ada to other programming languages. The report focuses on the programming languages Ada, C, and Lisp, the programming cultures that surround them, and the programming paradigms they support.
Haptic adaptation to slant: No transfer between exploration modes

PubMed Central

van Dam, Loes C. J.; Plaisier, Myrthe A.; Glowania, Catharina; Ernst, Marc O.

2016-01-01

Human touch is an inherently active sense: to estimate an object’s shape humans often move their hand across its surface. This way the object is sampled both in a serial (sampling different parts of the object across time) and parallel fashion (sampling using different parts of the hand simultaneously). Both the serial (moving a single finger) and parallel (static contact with the entire hand) exploration modes provide reliable and similar global shape information, suggesting the possibility that this information is shared early in the sensory cortex. In contrast, we here show the opposite. Using an adaptation-and-transfer paradigm, a change in haptic perception was induced by slant-adaptation using either the serial or parallel exploration mode. A unified shape-based coding would predict that this would equally affect perception using other exploration modes. However, we found that adaptation-induced perceptual changes did not transfer between exploration modes. Instead, serial and parallel exploration components adapted simultaneously, but to different kinaesthetic aspects of exploration behaviour rather than object-shape per se. These results indicate that a potential combination of information from different exploration modes can only occur at down-stream cortical processing stages, at which adaptation is no longer effective. PMID:27698392
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Electrophysiological evidence for parallel and serial processing during visual search.

PubMed

Luck, S J; Hillyard, S A

1990-12-01

Event-related potentials were recorded from young adults during a visual search task in order to evaluate parallel and serial models of visual processing in the context of Treisman's feature integration theory. Parallel and serial search strategies were produced by the use of feature-present and feature-absent targets, respectively. In the feature-absent condition, the slopes of the functions relating reaction time and latency of the P3 component to set size were essentially identical, indicating that the longer reaction times observed for larger set sizes can be accounted for solely by changes in stimulus identification and classification time, rather than changes in post-perceptual processing stages. In addition, the amplitude of the P3 wave on target-present trials in this condition increased with set size and was greater when the preceding trial contained a target, whereas P3 activity was minimal on target-absent trials. These effects are consistent with the serial self-terminating search model and appear to contradict parallel processing accounts of attention-demanding visual search performance, at least for a subset of search paradigms. Differences in ERP scalp distributions further suggested that different physiological processes are utilized for the detection of feature presence and absence.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
Educating the Psychology Workforce in the Age of the Affordable Care Act: A Graduate Course Modeled after the Priorities of the Patient-Centered Outcomes Research Institute (PCORI)

PubMed Central

Hoerger, Michael

2015-01-01

The Affordable Care Act (ACA) represents a paradigm shift in the U.S. healthcare system, which has implications for psychology programs producing the next generation of trainees. In particular, the ACA has established the Patient-Centered Outcomes Research Institute (PCORI), which has been tasked with developing national priorities and funding research aimed at improving healthcare quality by helping patients and providers to make informed healthcare decisions. PCORI's national priorities span five broad domains: person-centered outcomes research, health disparities research, healthcare systems research, communication and dissemination research, and methodologic research. As these national priorities overlap with the knowledge and skills often emphasized in psychology training programs, initiatives by training programs to bolster strengths in these domains could place trainees at the forefront of this emerging research paradigm. As a part of a new Masters program in behavioral health, our program developed a health psychology course modeled around PCORI's five national priorities, and an initial evaluation in a small sample supported student learning in the five PCORI domains. In summary, the current report has implications for familiarizing readers with PCORI's national priorities for U.S. healthcare, stimulating debate surrounding psychology's response to the largest healthcare paradigm shift in recent U.S. history, and providing a working model for programs seeking to implement PCORI-related changes to their curricula. PMID:26843899
78 FR 76628 - Pilot Program for Parallel Review of Medical Products; Extension of the Duration of the Program

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-18

...The Food and Drug Administration (FDA) and the Centers for Medicare and Medicaid Services (CMS) (the Agencies) are announcing the extension of the ``Pilot Program for Parallel Review of Medical Products.'' The Agencies have decided to continue the program as currently designed for an additional period of 2 years from the date of publication of this notice.
Astrophysical data mining with GPU. A case study: Genetic classification of globular clusters

NASA Astrophysics Data System (ADS)

Cavuoti, S.; Garofalo, M.; Brescia, M.; Paolillo, M.; Pescape', A.; Longo, G.; Ventre, G.

2014-01-01

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU/CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of GAME will be made available to the community by integrating it into the web application DAMEWARE (DAta Mining Web Application REsource, http://dame.dsf.unina.it/beta_info.html), a public data mining service specialized on massive astrophysical data. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm leads to a speedup of a factor of 200× in the training phase with respect to the CPU based version.
a Non-Overlapping Discretization Method for Partial Differential Equations

NASA Astrophysics Data System (ADS)

Rosas-Medina, A.; Herrera, I.

2013-05-01

Mathematical models of many systems of interest, including very important continuous systems of Engineering and Science, lead to a great variety of partial differential equations whose solution methods are based on the computational processing of large-scale algebraic systems. Furthermore, the incredible expansion experienced by the existing computational hardware and software has made amenable to effective treatment problems of an ever increasing diversity and complexity, posed by engineering and scientific applications. The emergence of parallel computing prompted on the part of the computational-modeling community a continued and systematic effort with the purpose of harnessing it for the endeavor of solving boundary-value problems (BVPs) of partial differential equations. Very early after such an effort began, it was recognized that domain decomposition methods (DDM) were the most effective technique for applying parallel computing to the solution of partial differential equations, since such an approach drastically simplifies the coordination of the many processors that carry out the different tasks and also reduces very much the requirements of information-transmission between them. Ideally, DDMs intend producing algorithms that fulfill the DDM-paradigm; i.e., such that "the global solution is obtained by solving local problems defined separately in each subdomain of the coarse-mesh -or domain-decomposition-". Stated in a simplistic manner, the basic idea is that, when the DDM-paradigm is satisfied, full parallelization can be achieved by assigning each subdomain to a different processor. When intensive DDM research began much attention was given to overlapping DDMs, but soon after attention shifted to non-overlapping DDMs. This evolution seems natural when the DDM-paradigm is taken into account: it is easier to uncouple the local problems when the subdomains are separated. However, an important limitation of non-overlapping domain decompositions, as that concept is usually understood today, is that interface nodes are shared by two or more subdomains of the coarse-mesh and, therefore, even non-overlapping DDMs are actually overlapping when seen from the perspective of the nodes used in the discretization. In this talk we present and discuss a discretization method in which the nodes used are non-overlapping, in the sense that each one of them belongs to one and only one subdomain of the coarse-mesh.
Study on Instructional Paradigms of Virtual Education in Pakistan: A Learners' Perspective

ERIC Educational Resources Information Center

Hussain, Irshad

2012-01-01

The present study is aimed at examining instructional paradigms of virtual education in Pakistan. The population of the study consisted of learners from Master of Business Administration (MBA) Program at Virtual University (VU) of Pakistan. The researcher adopted convenient sampling technique and collected data from 600 learners through five-point…
Toward an Ecological Paradigm in Adventure Programming

ERIC Educational Resources Information Center

Beringer, Almut

2004-01-01

Many forms of adventure therapy, in particular wilderness therapy, rely on challenges in the outdoors to achieve objectives of client change. While nature is drawn on as a medium for therapy and healing, some adventure therapists give nature little if any mention when it comes to explaining therapeutic success. The dominant paradigm in psychology…
School-Business-University Collaborative: A New Paradigm for Urban Education.

ERIC Educational Resources Information Center

Deighan, William P.

This paper describes a new paradigm for urban education, a school, business, and university collaboration in Cleveland (Ohio). Participating in the partnership are John F. Kennedy High School, the East Ohio Gas Company, and the Graduate Program in Administration and Supervision at John Carroll University in University Heights (Ohio). The paper…
Developing Competency-Based Advising Practices in Response to Paradigm Shifts in Higher Education

ERIC Educational Resources Information Center

Walters, Giovanna

2016-01-01

Competency-based programs have gained prominence in recent years for two primary reasons. First, more students are seeking ways to apply nonclassroom learning experiences toward a degree. Second, a paradigm shift in higher education encourages postsecondary curriculum developers to accept nonclassroom experiences as demonstrations of skills and…
Exploring the Synergies between the Object Oriented Paradigm and Mathematics: A Java Led Approach

ERIC Educational Resources Information Center

Conrad, Marc; French, Tim

2004-01-01

While the object oriented paradigm and its instantiation within programming languages such as Java has become a ubiquitous part of both the commercial and educational landscapes, its usage as a visualization technique within mathematics undergraduate programmes of study has perhaps been somewhat underestimated. By regarding the object oriented…
Application of a Curriculum Hierarchy Evaluation (CHE) Model to Sequentially Arranged Tasks.

ERIC Educational Resources Information Center

O'Malley, J. Michael

A curriculum hierarchy evaluation (CHE) model was developed by combining a transfer paradigm with an aptitude-treatment-task interaction (ATTI) paradigm. Positive transfer was predicted between sequentially arranged tasks, and a programed or nonprogramed treatment was predicted to interact with aptitude and with tasks. Eighteen four and five…
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Describing, using 'recognition cones'. [parallel-series model with English-like computer program

NASA Technical Reports Server (NTRS)

Uhr, L.

1973-01-01

A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.
Space transportation alternatives for large space programs: The International Space University Summer Session, 1992

NASA Technical Reports Server (NTRS)

Palaszewski, Bryan A.

1993-01-01

In 1992, the International Space University (ISU) held its Summer Session in Kitakyushu, Japan. This paper summarizes and expands upon some aspects of space solar power and space transportation that were considered during that session. The issues discussed in this paper are the result of a 10-week study by the Space Solar Power Program design project members and the Space Transportation Group to investigate new paradigms in space propulsion and how those paradigms might reduce the costs for large space programs. The program plan was to place a series of power satellites in Earth orbit. Several designs were studied where many kW, MW, or GW of power would be transmitted to Earth or to other spacecraft in orbit. During the summer session, a space solar power system was also detailed and analyzed. A high-cost space transportation program is potentially the most crippling barrier to such a space power program. At ISU, the focus of the study was to foster and develop some of the new paradigms that may eliminate the barriers to low cost for space exploration and exploitation. Many international and technical aspects of a large multinational program were studied. Environmental safety, space construction and maintenance, legal and policy issues of frequency allocation, technology transfer and control and many other areas were addressed. Over 120 students from 29 countries participated in this summer session. The results discussed in this paper, therefore, represent the efforts of many nations.
Effectiveness of music therapy as an aid to neurorestoration of children with severe neurological disorders

PubMed Central

Bringas, Maria L.; Zaldivar, Marilyn; Rojas, Pedro A.; Martinez-Montes, Karelia; Chongo, Dora M.; Ortega, Maria A.; Galvizu, Reynaldo; Perez, Alba E.; Morales, Lilia M.; Maragoto, Carlos; Vera, Hector; Galan, Lidice; Besson, Mireille; Valdes-Sosa, Pedro A.

2015-01-01

This study was a two-armed parallel group design aimed at testing real world effectiveness of a music therapy (MT) intervention for children with severe neurological disorders. The control group received only the standard neurorestoration program and the experimental group received an additional MT “Auditory Attention plus Communication protocol” just before the usual occupational and speech therapy. Multivariate Item Response Theory (MIRT) identified a neuropsychological status-latent variable manifested in all children and which exhibited highly significant changes only in the experimental group. Changes in brain plasticity also occurred in the experimental group, as evidenced using a Mismatch Event Related paradigm which revealed significant post intervention positive responses in the latency range between 308 and 400 ms in frontal regions. LORETA EEG source analysis identified prefrontal and midcingulate regions as differentially activated by the MT in the experimental group. Taken together, our results showing improved attention and communication as well as changes in brain plasticity in children with severe neurological impairments, confirm the importance of MT for the rehabilitation of patients across a wide range of dysfunctions. PMID:26582974

Commnity Petascale Project for Accelerator Science And Simulation: Advancing Computational Science for Future Accelerators And Accelerator Technologies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Spentzouris, Panagiotis; /Fermilab; Cary, John

The design and performance optimization of particle accelerators are essential for the success of the DOE scientific program in the next decade. Particle accelerators are very complex systems whose accurate description involves a large number of degrees of freedom and requires the inclusion of many physics processes. Building on the success of the SciDAC-1 Accelerator Science and Technology project, the SciDAC-2 Community Petascale Project for Accelerator Science and Simulation (ComPASS) is developing a comprehensive set of interoperable components for beam dynamics, electromagnetics, electron cooling, and laser/plasma acceleration modelling. ComPASS is providing accelerator scientists the tools required to enable the necessarymore » accelerator simulation paradigm shift from high-fidelity single physics process modeling (covered under SciDAC1) to high-fidelity multiphysics modeling. Our computational frameworks have been used to model the behavior of a large number of accelerators and accelerator R&D experiments, assisting both their design and performance optimization. As parallel computational applications, the ComPASS codes have been shown to make effective use of thousands of processors.« less
Deep Brain Stimulation: A Paradigm Shifting Approach to Treat Parkinson's Disease.

PubMed

Hickey, Patrick; Stacy, Mark

2016-01-01

Parkinson disease (PD) is a chronic and progressive movement disorder classically characterized by slowed voluntary movements, resting tremor, muscle rigidity, and impaired gait and balance. Medical treatment is highly successful early on, though the majority of people experience significant complications in later stages. In advanced PD, when medications no longer adequately control motor symptoms, deep brain stimulation (DBS) offers a powerful therapeutic alternative. DBS involves the surgical implantation of one or more electrodes into specific areas of the brain, which modulate or disrupt abnormal patterns of neural signaling within the targeted region. Outcomes are often dramatic following DBS, with improvements in motor function and reductions motor complications having been repeatedly demonstrated. Given such robust responses, emerging indications for DBS are being investigated. In parallel with expansions of therapeutic scope, advancements within the areas of neurosurgical technique and the precision of stimulation delivery have recently broadened as well. This review focuses on the revolutionary addition of DBS to the therapeutic armamentarium for PD, and summarizes the technological advancements in the areas of neuroimaging and biomedical engineering intended to improve targeting, programming, and overall management.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit.

PubMed

Badal, Andreu; Badano, Aldo

2009-11-01

It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Effectiveness of music therapy as an aid to neurorestoration of children with severe neurological disorders.

PubMed

Bringas, Maria L; Zaldivar, Marilyn; Rojas, Pedro A; Martinez-Montes, Karelia; Chongo, Dora M; Ortega, Maria A; Galvizu, Reynaldo; Perez, Alba E; Morales, Lilia M; Maragoto, Carlos; Vera, Hector; Galan, Lidice; Besson, Mireille; Valdes-Sosa, Pedro A

2015-01-01

This study was a two-armed parallel group design aimed at testing real world effectiveness of a music therapy (MT) intervention for children with severe neurological disorders. The control group received only the standard neurorestoration program and the experimental group received an additional MT "Auditory Attention plus Communication protocol" just before the usual occupational and speech therapy. Multivariate Item Response Theory (MIRT) identified a neuropsychological status-latent variable manifested in all children and which exhibited highly significant changes only in the experimental group. Changes in brain plasticity also occurred in the experimental group, as evidenced using a Mismatch Event Related paradigm which revealed significant post intervention positive responses in the latency range between 308 and 400 ms in frontal regions. LORETA EEG source analysis identified prefrontal and midcingulate regions as differentially activated by the MT in the experimental group. Taken together, our results showing improved attention and communication as well as changes in brain plasticity in children with severe neurological impairments, confirm the importance of MT for the rehabilitation of patients across a wide range of dysfunctions.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Adult Offenders' Perceptions of Rehabilitation Programs in Africa

ERIC Educational Resources Information Center

Ngozwana, Nomazulu

2017-01-01

This article reflects on adult offenders' perceptions of rehabilitation programs in Africa. It also evaluates whether offenders are consulted when planning rehabilitation programs. Adult education principles were used as a lens to understand offenders' perceptions of rehabilitation programs. Using an interpretive paradigm and qualitative approach,…
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Parallelization of elliptic solver for solving 1D Boussinesq model

NASA Astrophysics Data System (ADS)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Relative Debugging of Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2002-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
A software tool for dataflow graph scheduling

NASA Technical Reports Server (NTRS)

Jones, Robert L., III

1994-01-01

A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on multiple processors. The dataflow paradigm is very useful in exposing the parallelism inherent in algorithms. It provides a graphical and mathematical model which describes a partial ordering of algorithm tasks based on data precedence.
Paralex: An Environment for Parallel Programming in Distributed Systems

DTIC Science & Technology

1991-12-07

distributed systems is coni- parable to assembly language programming for traditional sequential systems - the user must resort to low-level primitives ...to accomplish data encoding/decoding, communication, remote exe- cution, synchronization , failure detection and recovery. It is our belief that... synchronization . Finally, composing parallel programs by interconnecting se- quential computations allows automatic support for heterogeneity and fault tolerance
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Multimission Software Reuse in an Environment of Large Paradigm Shifts

NASA Technical Reports Server (NTRS)

Wilson, Robert K.

1996-01-01

The ground data systems provided for NASA space mission support are discussed. As space missions expand, the ground systems requirements become more complex. Current ground data systems provide for telemetry, command, and uplink and downlink processing capabilities. The new millennium project (NMP) technology testbed for 21st century NASA missions is discussed. The program demonstrates spacecraft and ground system technologies. The paradigm shift from detailed ground sequencing to a goal oriented planning approach is considered. The work carried out to meet this paradigm for the Deep Space-1 (DS-1) mission is outlined.
Olfactory Cued Learning Paradigm.

PubMed

Liu, Gary; McClard, Cynthia K; Tepe, Burak; Swanson, Jessica; Pekarek, Brandon; Panneerselvam, Sugi; Arenkiel, Benjamin R

2017-05-05

Sensory stimulation leads to structural changes within the CNS (Central Nervous System), thus providing the fundamental mechanism for learning and memory. The olfactory circuit offers a unique model for studying experience-dependent plasticity, partly due to a continuous supply of integrating adult born neurons. Our lab has recently implemented an olfactory cued learning paradigm in which specific odor pairs are coupled to either a reward or punishment to study downstream circuit changes. The following protocol outlines the basic set up for our learning paradigm. Here, we describe the equipment setup, programming of software, and method of behavioral training.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Dual and parallel postdoctoral training programs: implications for the osteopathic medical profession.

PubMed

Burkhart, Diane N; Lischka, Terri A

2011-04-01

Students in colleges of osteopathic medicine have several options when considering postdoctoral training programs. In addition to training programs approved solely by the American Osteopathic Association or accredited solely by the Accreditation Council for Graduate Medical Education (ACGME), students can pursue programs accredited by both organizations (ie, dually accredited programs) or osteopathic programs that occur side-by-side with ACGME programs (ie, parallel programs). In the present article, we report on the availability and growth of these 2 training options and describe their benefits and drawbacks for trainees and the osteopathic medical profession as a whole.
OpenCL based machine learning labeling of biomedical datasets

NASA Astrophysics Data System (ADS)

Amoros, Oscar; Escalera, Sergio; Puig, Anna

2011-03-01

In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and labeling speeds.

The Paradigm Recursion: Is It More Accessible When Introduced in Middle School?

ERIC Educational Resources Information Center

Gunion, Katherine; Milford, Todd; Stege, Ulrike

2009-01-01

Recursion is a programming paradigm as well as a problem solving strategy thought to be very challenging to grasp for university students. This article outlines a pilot study, which expands the age range of students exposed to the concept of recursion in computer science through instruction in a series of interesting and engaging activities. In…
Sustainability or Limitless Expansion: Paradigm Shift in HRD Practice and Teaching

ERIC Educational Resources Information Center

Ardichvili, Alexandre

2012-01-01

Purpose: This paper aims to discuss a shift from the mentality of limitless growth and expansion to the new sustainability paradigm in HRD practice, and identifies what corresponding changes are needed in human resource development (HRD) university programs. Design/methodology/approach: The paper is based on a review of the literature in HRD and…
Using the Natural Language Paradigm (NLP) to Increase Vocalizations of Older Adults with Cognitive Impairments

ERIC Educational Resources Information Center

LeBlanc, Linda A.; Geiger, Kaneen B.; Sautter, Rachael A.; Sidener, Tina M.

2007-01-01

The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated…
Working Toward the Experimenter: Reconceptualizing Obedience Within the Milgram Paradigm as Identification-Based Followership.

PubMed

Reicher, Stephen D; Haslam, S Alexander; Smith, Joanne R

2012-07-01

The behavior of participants within Milgram's obedience paradigm is commonly understood to arise from the propensity to cede responsibility to those in authority and hence to obey them. This parallels a belief that brutality in general arises from passive conformity to roles. However, recent historical and social psychological research suggests that agents of tyranny actively identify with their leaders and are motivated to display creative followership in working toward goals that they believe those leaders wish to see fulfilled. Such analysis provides the basis for reinterpreting the behavior of Milgram's participants. It is supported by a range of material, including evidence that the willingness of participants to administer 450-volt shocks within the Milgram paradigm changes dramatically, but predictably, as a function of experimental variations that condition participants' identification with either the experimenter and the scientific community that he represents or the learner and the general community that he represents. This reinterpretation also encourages us to see Milgram's studies not as demonstrations of conformity or obedience, but as explorations of the power of social identity-based leadership to induce active and committed followership. © The Author(s) 2012.
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Exploiting loop level parallelism in nonprocedural dataflow programs

NASA Technical Reports Server (NTRS)

Gokhale, Maya B.

1987-01-01

Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Tolerant (parallel) Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Bailey, David H. (Technical Monitor)

1997-01-01

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
New architectural paradigms for multi-petabyte distributed storage systems

NASA Technical Reports Server (NTRS)

Lee, Richard R.

1994-01-01

In the not too distant future, programs such as NASA's Earth Observing System, NSF/ARPA/NASA's Digital Libraries Initiative and Intelligence Community's (NSA, CIA, NRO, etc.) mass storage system upgrades will all require multi-petabyte (petabyte: 1015 bytes of bitfile data) (or larger) distributed storage solutions. None of these requirements, as currently defined, will meet their objectives utilizing either today's architectural paradigms or storage solutions. Radically new approaches will be required to not only store and manage veritable 'mountain ranges of data', but to make the cost of ownership affordable, much less practical in today's (and certainly the future's) austere budget environment! Within this paper we will explore new architectural paradigms and project systems performance benefits and dollars per petabyte of information stored. We will discuss essential 'top down' approaches to achieving an overall systems level performance capability sufficient to meet the challenges of these major programs.
Chronic ethanol exposure combined with high fat diet up-regulates P2X7 receptors that parallels neuroinflammation and neuronal loss in C57BL/6J mice

PubMed Central

Asatryan, Liana; Khoja, Sheraz; Rodgers, Kathleen E; Alkana, Ronald L; Tsukamoto, Hidekatsu; Davies, Daryl L.

2015-01-01

The present investigation tested the role of ATP-activated P2X7 receptors (P2X7Rs) in alcohol-induced brain damage using a model that combines intragastric (iG) ethanol feeding and high fat diet in C57BL/6J mice (Hybrid). The Hybrid paradigm caused increased levels of pro-inflammatory markers, changes in microglia and astrocytes, reduced levels of neuronal marker NeuN and increased P2X7R expression in ethanol-sensitive brain regions. Observed changes in P2X7R and NeuN expression were more pronounced in Hybrid paradigm with inclusion of additional weekly binges. In addition, high fat diet during Hybrid exposure aggravated the increase in P2X7R expression and activation of glial cells. PMID:26198936
On the transmission of partial information: inferences from movement-related brain potentials

NASA Technical Reports Server (NTRS)

Osman, A.; Bashore, T. R.; Coles, M. G.; Donchin, E.; Meyer, D. E.

1992-01-01

Results are reported from a new paradigm that uses movement-related brain potentials to detect response preparation based on partial information. The paradigm uses a hybrid choice-reaction go/nogo procedure in which decisions about response hand and whether to respond are based on separate stimulus attributes. A lateral asymmetry in the movement-related brain potential was found on nogo trials without overt movement. The direction of this asymmetry depended primarily on the signaled response hand rather than on properties of the stimulus. When the asymmetry first appeared was influenced by the time required to select the signaled hand, and when it began to differ on go and nogo trials was influenced by the time to decide whether to respond. These findings indicate that both stimulus attributes were processed in parallel and that the asymmetry reflected preparation of the response hand that began before the go/nogo decision was completed.
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

USDA-ARS?s Scientific Manuscript database

This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
The Biopsychology-Toolbox: a free, open-source Matlab-toolbox for the control of behavioral experiments.

PubMed

Rose, Jonas; Otto, Tobias; Dittrich, Lars

2008-10-30

The Biopsychology-Toolbox is a free, open-source Matlab-toolbox for the control of behavioral experiments. The major aim of the project was to provide a set of basic tools that allow programming novices to control basic hardware used for behavioral experimentation without limiting the power and flexibility of the underlying programming language. The modular design of the toolbox allows portation of parts as well as entire paradigms between different types of hardware. In addition to the toolbox, this project offers a platform for the exchange of functions, hardware solutions and complete behavioral paradigms.
Publish or perish, and pay--the new paradigm of open-access journals.

PubMed

Tzarnas, Stephanie; Tzarnas, Chris D

2015-01-01

The new open-access journal business model is changing the publication landscape and residents and junior faculty should be aware of these changes. A national survey of surgery program directors and residents was performed. Open-access journals have been growing over the past decade, and many traditional printed journals are also sponsoring open-access options (the hybrid model) for accepted articles. Authors need to be aware of the new publishing paradigm and potential costs involved in publishing their work. Copyright © 2014 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Sex Trafficking: Policies, Programs, and Services.

PubMed

Orme, Julie; Ross-Sheriff, Fariyal

2015-10-01

Sex trafficking (ST), a contemporary form of female slavery, is a human rights issue of critical concern to social work. The global response to ST has been substantial, and 166 countries have adopted anti-ST legislation. Despite considerable efforts to combat ST, the magnitude is increasing. To date, the majority of anti-ST efforts have focused on criminalization policies that target traffickers or purchasers of sexual services, who are predominantly male; prevention programming and services for predominantly female victims have received less support. Therapeutic services to assist pornography addicts and purchasers of sexual services are also necessary. In this article, authors examine current anti-ST policies, programs, and services, both domestically and globally, and present an innovative paradigm that addresses social inequities and emphasizes prevention programming. They conclude with a discussion of the paradigm's implications for social work policies, practices, and services.
Conjugal conflict and violence: a review and theoretical paradigm.

PubMed

Smilkstein, G; Aspy, C B; Quiggins, P A

1994-02-01

Conjugal violence has been described as having multiple etiologies. The variables are so numerous that intervention and research protocols are difficult to effect. This paper proposes a paradigm that establishes conjugal conflict and violence as separate entities. According to the paradigm, conjugal conflict is viewed as "an inevitable part of human association," whereas conjugal violence is determined to be a learned behavioral tactic that is employed as a coping strategy when an individual's conflict threshold potential is exceeded. Evidence will be offered that violence is learned from family of origin and from observing what is common or accepted practice in the community. Use of this paradigm would give primacy to community education programs that advance the concept of conflict resolution through rational discourse.
The Fate of the Method of 'Paradigms' in Paleobiology.

PubMed

Rudwick, Martin J S

2017-11-02

An earlier article described the mid-twentieth century origins of the method of "paradigms" in paleobiology, as a way of making testable hypotheses about the functional morphology of extinct organisms. The present article describes the use of "paradigms" through the 1970s and, briefly, to the end of the century. After I had proposed the paradigm method to help interpret the ecological history of brachiopods, my students developed it in relation to that and other invertebrate phyla, notably in Euan Clarkson's analysis of vision in trilobites. David Raup's computer-aided "theoretical morphology" was then combined with my functional or adaptive emphasis, in Adolf Seilacher's tripartite "constructional morphology." Stephen Jay Gould, who had strongly endorsed the method, later switched to criticizing the "adaptationist program" he claimed it embodied. Although the explicit use of paradigms in paleobiology had declined by the end of the century, the method was tacitly subsumed into functional morphology as "biomechanics."

Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Parallel Unsteady Overset Mesh Methodology for a Multi-Solver Paradigm with Adaptive Cartesian Grids

DTIC Science & Technology

2008-08-21

Engineer, U.S. Army Research Laboratory ., Matthew.W.Floros@nasa.gov, AIAA Member ‡Senior Research Scientist, Scaled Numerical Physics LLC., awissink...IV.E and IV.D). Good linear scalability was observed for all three cases up to 12 processors. Beyond that the scalability drops off depending on grid...Research Laboratory for the usage of SUGGAR module and Yikloon Lee at NAVAIR for the usage of the NAVAIR-IHC code. 13 of 22 American Institute of
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Planning in subsumption architectures

NASA Technical Reports Server (NTRS)

Chalfant, Eugene C.

1994-01-01

A subsumption planner using a parallel distributed computational paradigm based on the subsumption architecture for control of real-world capable robots is described. Virtual sensor state space is used as a planning tool to visualize the robot's anticipated effect on its environment. Decision sequences are generated based on the environmental situation expected at the time the robot must commit to a decision. Between decision points, the robot performs in a preprogrammed manner. A rudimentary, domain-specific partial world model contains enough information to extrapolate the end results of the rote behavior between decision points. A collective network of predictors operates in parallel with the reactive network forming a recurrrent network which generates plans as a hierarchy. Details of a plan segment are generated only when its execution is imminent. The use of the subsumption planner is demonstrated by a simple maze navigation problem.
The Multiple Abilities Paradigm: Integrated General and Special Education Teacher Preparation.

ERIC Educational Resources Information Center

Ellis, Edwin S.; And Others

1995-01-01

The Multiple Abilities Program (MAP) at the University of Alabama is a five-semester, competency-based preservice program preparing teachers to teach all students regardless of settings or disability labels. This article outlines the program rationale, organizational framework, and the program feature in which undergraduates spend over 50 percent…
Curricular Ethics in Early Childhood Education Programming: A Challenge to the Ontario Kindergarten Program

ERIC Educational Resources Information Center

Heydon, Rachel M.; Wang, Ping

2006-01-01

Through a case study of a key Canadian early childhood education program, The Kindergarten Program (Ontario Ministry of Education and Training, 1998a), we explore the relationship between curricular paradigms and early childhood education (ECE) models, and the opportunities that each creates for enacting ethical teaching and learning…
Roles of Variables in Three Programming Paradigms

ERIC Educational Resources Information Center

Sajaniemi, J.; Ben-Ari, M.; Byckling, P.; Gerdt, P.; Kulikova, Y.

2006-01-01

Roles can be assigned to occurrences of variables in programs according to a small number of stereotypical patterns of use. Studies on explicitly teaching roles to novices learning programming have shown that roles are an excellent pedagogical tool for clarifying the structure and meaning of programs and that their use improves students'…
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Turbomachinery computational fluid dynamics: asymptotes and paradigm shifts.

PubMed

Dawes, W N

2007-10-15

This paper reviews the development of computational fluid dynamics (CFD) specifically for turbomachinery simulations and with a particular focus on application to problems with complex geometry. The review is structured by considering this development as a series of paradigm shifts, followed by asymptotes. The original S1-S2 blade-blade-throughflow model is briefly described, followed by the development of two-dimensional then three-dimensional blade-blade analysis. This in turn evolved from inviscid to viscous analysis and then from steady to unsteady flow simulations. This development trajectory led over a surprisingly small number of years to an accepted approach-a 'CFD orthodoxy'. A very important current area of intense interest and activity in turbomachinery simulation is in accounting for real geometry effects, not just in the secondary air and turbine cooling systems but also associated with the primary path. The requirements here are threefold: capturing and representing these geometries in a computer model; making rapid design changes to these complex geometries; and managing the very large associated computational models on PC clusters. Accordingly, the challenges in the application of the current CFD orthodoxy to complex geometries are described in some detail. The main aim of this paper is to argue that the current CFD orthodoxy is on a new asymptote and is not in fact suited for application to complex geometries and that a paradigm shift must be sought. In particular, the new paradigm must be geometry centric and inherently parallel without serial bottlenecks. The main contribution of this paper is to describe such a potential paradigm shift, inspired by the animation industry, based on a fundamental shift in perspective from explicit to implicit geometry and then illustrate this with a number of applications to turbomachinery.
Changing Perceptions of Flooding and Stormwater as a Driver of Urban Hydrology and Biogeochemistry

NASA Astrophysics Data System (ADS)

Hale, R. L.

2015-12-01

Urbanization can have detrimental impacts on downstream ecosystems due to its effects on hydrological and biogeochemical cycles. In particular, how urban stormwater systems are designed have implications for flood regimes and biogeochemical transformations. Flood and stormwater management paradigms have shifted over time at large scales, but patterns and drivers of local stormwater infrastructure designs are unknown. We describe patterns of infrastructure design and use over the 20th century in three cities along an urbanization gradient in Utah: Salt Lake, Logan, and Heber City. To understand changes in stormwater management paradigms we conducted a historical media content analysis of newspaper articles related to flooding and stormwater in Salt Lake City from 1900 to 2012. Stormwater infrastructure design varied spatially and temporally, both within and among cities. All three cities transitioned from agriculture to urban land use, and legacies were evident in the use of agricultural canals for stormwater conveyance. Salt Lake City infrastructure transitioned from centralized storm sewers during early urbanization to decentralized detention systems in the 1970's. In contrast, newer cities, Logan and Heber, saw parallel increases in conveyance and detention systems with urbanization. The media analysis revealed significant changes in flood and stormwater management paradigms over the 20th century that were driven by complex factors including top-down regulations, local disturbances, and funding constraints. Early management paradigms focused on infrastructural solutions to address problems with private and public property damage, whereas more recent paradigms focus on behavioral solutions to flooding and green infrastructure solutions to prevent negative impacts of urban stormwater on local ecosystems. Changes in human perceptions of the environment can affect how we design urban ecosystems, with important implications for ecological functions.
On the costs of parallel processing in dual-task performance: The case of lexical processing in word production.

PubMed

Paucke, Madlen; Oppermann, Frank; Koch, Iring; Jescheniak, Jörg D

2015-12-01

Previous dual-task picture-naming studies suggest that lexical processes require capacity-limited processes and prevent other tasks to be carried out in parallel. However, studies involving the processing of multiple pictures suggest that parallel lexical processing is possible. The present study investigated the specific costs that may arise when such parallel processing occurs. We used a novel dual-task paradigm by presenting 2 visual objects associated with different tasks and manipulating between-task similarity. With high similarity, a picture-naming task (T1) was combined with a phoneme-decision task (T2), so that lexical processes were shared across tasks. With low similarity, picture-naming was combined with a size-decision T2 (nonshared lexical processes). In Experiment 1, we found that a manipulation of lexical processes (lexical frequency of T1 object name) showed an additive propagation with low between-task similarity and an overadditive propagation with high between-task similarity. Experiment 2 replicated this differential forward propagation of the lexical effect and showed that it disappeared with longer stimulus onset asynchronies. Moreover, both experiments showed backward crosstalk, indexed as worse T1 performance with high between-task similarity compared with low similarity. Together, these findings suggest that conditions of high between-task similarity can lead to parallel lexical processing in both tasks, which, however, does not result in benefits but rather in extra performance costs. These costs can be attributed to crosstalk based on the dual-task binding problem arising from parallel processing. Hence, the present study reveals that capacity-limited lexical processing can run in parallel across dual tasks but only at the expense of extraordinary high costs. (c) 2015 APA, all rights reserved).
Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

2016-08-01

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
GPURFSCREEN: a GPU based virtual screening tool using random forest classifier.

PubMed

Jayaraj, P B; Ajay, Mathias K; Nufail, M; Gopakumar, G; Jaleel, U C A

2016-01-01

In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.
A preclinical cognitive test battery to parallel the National Institute of Health Toolbox in humans: bridging the translational gap.

PubMed

Snigdha, Shikha; Milgram, Norton W; Willis, Sherry L; Albert, Marylin; Weintraub, S; Fortin, Norbert J; Cotman, Carl W

2013-07-01

A major goal of animal research is to identify interventions that can promote successful aging and delay or reverse age-related cognitive decline in humans. Recent advances in standardizing cognitive assessment tools for humans have the potential to bring preclinical work closer to human research in aging and Alzheimer's disease. The National Institute of Health (NIH) has led an initiative to develop a comprehensive Toolbox for Neurologic Behavioral Function (NIH Toolbox) to evaluate cognitive, motor, sensory and emotional function for use in epidemiologic and clinical studies spanning 3 to 85 years of age. This paper aims to analyze the strengths and limitations of animal behavioral tests that can be used to parallel those in the NIH Toolbox. We conclude that there are several paradigms available to define a preclinical battery that parallels the NIH Toolbox. We also suggest areas in which new tests may benefit the development of a comprehensive preclinical test battery for assessment of cognitive function in animal models of aging and Alzheimer's disease. Copyright © 2013 Elsevier Inc. All rights reserved.
A preclinical cognitive test battery to parallel the National Institute of Health Toolbox in humans: bridging the translational gap

PubMed Central

Snigdha, Shikha; Milgram, Norton W.; Willis, Sherry L.; Albert, Marylin; Weintraub, S.; Fortin, Norbert J.; Cotman, Carl W.

2013-01-01

A major goal of animal research is to identify interventions that can promote successful aging and delay or reverse age-related cognitive decline in humans. Recent advances in standardizing cognitive assessment tools for humans have the potential to bring preclinical work closer to human research in aging and Alzheimer’s disease. The National Institute of Health (NIH) has led an initiative to develop a comprehensive Toolbox for Neurologic Behavioral Function (NIH Toolbox) to evaluate cognitive, motor, sensory and emotional function for use in epidemiologic and clinical studies spanning 3 to 85 years of age. This paper aims to analyze the strengths and limitations of animal behavioral tests that can be used to parallel those in the NIH Toolbox. We conclude that there are several paradigms available to define a preclinical battery that parallels the NIH Toolbox. We also suggest areas in which new tests may benefit the development of a comprehensive preclinical test battery for assessment of cognitive function in animal models of aging and Alzheimer’s disease. PMID:23434040
Curriculum: The Keystone for Special Education Planning.

ERIC Educational Resources Information Center

Goldstein, Marjorie T.

1986-01-01

The article highlights the role of curriculum in special education within the contexts of the Individualized Education Program and a 6-S paradigm of the instructional program (someone, something, somebody, somehow, somewhere, and sometime). (CL)
The "Flavor" of the Social Ecology Paradigm in Use: Building on Mutual Social Support in Preventing Drug Abuse.

ERIC Educational Resources Information Center

Thorsheim, Howard I.; Roberts, Bruce B.

The "Bottled Pain" project, a drug abuse prevention program in 24 Lutheran congregations in southern Minnesota, is based on a social ecology paradigm designed to prevent drug abuse through the development of socially supportive relationshps and through using the environment as a natural strength within the community. According to the…

The Integrated Joslin Performance Improvement/CME Program: A New Paradigm for Better Diabetes Care

ERIC Educational Resources Information Center

Brown, Julie A.; Beaser, Richard S.; Neighbours, James; Shuman, Jill

2011-01-01

Ongoing continuing medical education is an essential component of life-long learning and can have a positive influence on patient outcomes. However, some evidence suggests that continuing medical education has not fulfilled its potential as a performance improvement (PI) tool, in part due to a paradigm of CME that has focused on the quantity of…
Early life stress paradigms in rodents: potential animal models of depression?

PubMed

Schmidt, Mathias V; Wang, Xiao-Dong; Meijer, Onno C

2011-03-01

While human depressive illness is indeed uniquely human, many of its symptoms may be modeled in rodents. Based on human etiology, the assumption has been made that depression-like behavior in rats and mice can be modulated by some of the powerful early life programming effects that are known to occur after manipulations in the first weeks of life. Here we review the evidence that is available in literature for early life manipulation as risk factors for the development of depression-like symptoms such as anhedonia, passive coping strategies, and neuroendocrine changes. Early life paradigms that were evaluated include early handling, separation, and deprivation protocols, as well as enriched and impoverished environments. We have also included a small number of stress-related pharmacological models. We find that for most early life paradigms per se, the actual validity for depression is limited. A number of models have not been tested with respect to classical depression-like behaviors, while in many cases, the outcome of such experiments is variable and depends on strain and additional factors. Because programming effects confer vulnerability rather than disease, a number of paradigms hold promise for usefulness in depression research, in combination with the proper genetic background and adult life challenges.
Parallel Algorithms for Computational Models of Geophysical Systems

NASA Astrophysics Data System (ADS)

Carrillo Ledesma, A.; Herrera, I.; de la Cruz, L. M.; Hernández, G.; Grupo de Modelacion Matematica y Computacional

2013-05-01

Mathematical models of many systems of interest, including very important continuous systems of Earth Sciences and Engineering, lead to a great variety of partial differential equations (PDEs) whose solution methods are based on the computational processing of large-scale algebraic systems. Furthermore, the incredible expansion experienced by the existing computational hardware and software has made amenable to effective treatment problems of an ever increasing diversity and complexity, posed by scientific and engineering applications. Parallel computing is outstanding among the new computational tools and, in order to effectively use the most advanced computers available today, massively parallel software is required. Domain decomposition methods (DDMs) have been developed precisely for effectively treating PDEs in paralle. Ideally, the main objective of domain decomposition research is to produce algorithms capable of 'obtaining the global solution by exclusively solving local problems', but up-to-now this has only been an aspiration; that is, a strong desire for achieving such a property and so we call it 'the DDM-paradigm'. In recent times, numerically competitive DDM-algorithms are non-overlapping, preconditioned and necessarily incorporate constraints which pose an additional challenge for achieving the DDM-paradigm. Recently a group of four algorithms, referred to as the 'DVS-algorithms', which fulfill the DDM-paradigm, was developed. To derive them a new discretization method, which uses a non-overlapping system of nodes (the derived-nodes), was introduced. This discretization procedure can be applied to any boundary-value problem, or system of such equations. In turn, the resulting system of discrete equations can be treated using any available DDM-algorithm. In particular, two of the four DVS-algorithms mentioned above were obtained by application of the well-known and very effective algorithms BDDC and FETI-DP; these will be referred to as the DVS-BDDC and DVS-FETI-DP algorithms. The other two, which will be referred to as the DVS-PRIMAL and DVS-DUAL algorithms, were obtained by application of two new algorithms that had not been previously reported in the literature. As said before, the four DVS-algorithms constitute a group of preconditioned and constrained algorithms that, for the first time, fulfill the DDM-paradigm. Both, BDDC and FETI-DP, are very well-known; and both are highly efficient. Recently, it was established that these two methods are closely related and its numerical performance is quite similar. On the other hand, through numerical experiments, we have established that the numerical performances of each one of the members of DVS-algorithms group (DVS-BDDC, DVS-FETI-DP, DVS-PRIMAL and DVS-DUAL) are very similar too. Furthermore, we have carried out comparisons of the performances of the standard versions of BDDC and FETI-DP with DVS-BDDC and DVS-FETI-DP, and in all such numerical experiments the DVS algorithms have performed significantly better.
Concurrency-based approaches to parallel programming

NASA Technical Reports Server (NTRS)

Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.

1995-01-01

The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Reliability models for dataflow computer systems

NASA Technical Reports Server (NTRS)

Kavi, K. M.; Buckles, B. P.

1985-01-01

The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Evaluating the Fabreville Heart Health Program in Laval, Canada: a dialogue between two paradigms, positivism and constructivism.

PubMed

Nguyen, Minh Nguyet; Otis, Joanne

2003-06-01

As part of the Canadian Federal-Provincial Initiative in Heart Health, the goal of the Fabreville Heart Health Program was to sensitize a district of Laval, Quebec's second most populous city, to heart-healthy behaviours. The program was planned and implemented by a committee composed of Fabreville community leaders and professionals from the Public Health Department. Between 1992 and 1994, intervention objectives were defined by the department in terms of changing individual behaviours associated with cardiovascular risk factors, namely diet, sedentariness and smoking, as well as adapting physical and social environments to facilitate these changes. However, from 1994 to its conclusion in 1997, the program was re-oriented to engage the population in mobilizing their own community and taking charge of interventions themselves. Actions then became dependent on the interests and motivation of Fabreville residents to transform their lifestyles and aspects of their physical environment. The initial evaluation process, based on the positivist paradigm, was designed to measure changes in individual behaviours and certain physical environments, such as an increase in designated non-smoking areas. However, following the re-orientation towards community mobilization, it was decided that evaluation should go beyond the professional production of data to include a process of the collective construction of knowledge. Evaluation methodology then became based on the constructivist paradigm. Yet field constraints such as lack of community involvement in both leadership and process evaluation, and the need to ensure evaluation standards and fulfil sponsor obligations, compelled the Public Health Department to return to using a certain number of positivist methods. The ensuing inter-paradigm dialogue helped broaden the scope of evaluation and contributed to gaining a more in-depth understanding of the processes and outcomes of community mobilization.
Functional Bowel Disorders: A Roadmap to Guide the Next Generation of Research.

PubMed

Chang, Lin; Di Lorenzo, Carlo; Farrugia, Gianrico; Hamilton, Frank A; Mawe, Gary M; Pasricha, Pankaj J; Wiley, John W

2018-02-01

In June 2016, the National Institutes of Health hosted a workshop on functional bowel disorders (FBDs), particularly irritable bowel syndrome, with the objective of elucidating gaps in current knowledge and recommending strategies to address these gaps. The workshop aimed to provide a roadmap to help strategically guide research efforts during the next decade. Attendees were a diverse group of internationally recognized leaders in basic and clinical FBD research. This document summarizes the results of their deliberations, including the following general conclusions and recommendations. First, the high prevalence, economic burden, and impact on quality of life associated with FBDs necessitate an urgent need for improved understanding of FBDs. Second, preclinical discoveries are at a point that they can be realistically translated into novel diagnostic tests and treatments. Third, FBDs are broadly accepted as bidirectional disorders of the brain-gut axis, differentially affecting individuals throughout life. Research must integrate each component of the brain-gut axis and the influence of biological sex, early-life stressors, and genetic and epigenetic factors in individual patients. Fourth, research priorities to improve diagnostic and management paradigms include enhancement of the provider-patient relationship, longitudinal studies to identify risk and protective factors of FBDs, identification of biomarkers and endophenotypes in symptom severity and treatment response, and incorporation of emerging "-omics" discoveries. These paradigms can be applied by well-trained clinicians who are familiar with multimodal treatments. Fifth, essential components of a successful program will include the generation of a large, validated, broadly accessible database that is rigorously phenotyped; a parallel, linkable biorepository; dedicated resources to support peer-reviewed, hypothesis-driven research; access to dedicated bioinformatics expertise; and oversight by funding agencies to review priorities, progress, and potential synergies with relevant stakeholders. Copyright © 2018 AGA Institute. Published by Elsevier Inc. All rights reserved.
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
The Goddard Space Flight Center Program to develop parallel image processing systems

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1972-01-01

Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Parallel Volunteer Learning during Youth Programs

ERIC Educational Resources Information Center

Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

2012-01-01

Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
Topical perspective on massive threading and parallelism.

PubMed

Farber, Robert M

2011-09-01

Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
Thinking Inside the Box: The Health Cube Paradigm for Health and Wellness Program Evaluation and Design

PubMed Central

Harris, Sharon

2013-01-01

Abstract Appropriately constructed health promotions can improve population health. The authors developed a practical model for designing, evaluating, and improving initiatives to provide optimal value. Three independent model dimensions (impact, engagement, and sustainability) and the resultant three-dimensional paradigm were described using hypothetical case studies, including a walking challenge, a health risk assessment survey, and an individual condition management program. The 3-dimensional model is illustrated and the dimensions are defined. Calculation of a 3-dimensional score for program comparisons, refinements, and measurement is explained. Program 1, the walking challenge, had high engagement and impact, but limited sustainability. Program 2, the health risk assessment survey, had high engagement and sustainability but limited impact. Program 3, the on-site condition management program, had measurable impact and sustainability but limited engagement, because of a lack of program capacity. Each initiative, though successful in 2 dimensions, lacked sufficient evolution along the third axis for optimal value. Calculation of a 3-dimensional score is useful for health promotion program development comparison and refinements, and overall measurement of program success. (Population Health Management 2013;16:291–295) PMID:23869538
Programming with Intervals

NASA Astrophysics Data System (ADS)

Matsakis, Nicholas D.; Gross, Thomas R.

Intervals are a new, higher-level primitive for parallel programming with which programmers directly construct the program schedule. Programs using intervals can be statically analyzed to ensure that they do not deadlock or contain data races. In this paper, we demonstrate the flexibility of intervals by showing how to use them to emulate common parallel control-flow constructs like barriers and signals, as well as higher-level patterns such as bounded-buffer producer-consumer. We have implemented intervals as a publicly available library for Java and Scala.
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
Transformational derivation of programs using the Focus system

NASA Technical Reports Server (NTRS)

Reddy, Uday S.

1988-01-01

A program derivation support system called Focus is being constructed. It will formally derive programs using the paradigm of program transformation. The following issues are discussed: (1) the integration of validation and program derivation activities in the Focus system; (2) its tree-based user interface; (3) the control of search spaces in program derivation; and (4) the structure and organization of program derivation records. The inference procedures of the system are based on the integration of functional and logic programming principles. This brings about a synthesis of paradigms that were heretofore considered far apart, such as logical and executable specifications and constructive and transformational approaches to program derivation. A great emphasis has been placed, in the design of Focus, on achieving small search spaces during program derivation. The program manipulation operations such as expansion, simplification and rewriting were designed with this objective. The role of operations that are expensive in search spaces, such as folding, has been reduced. Program derivations are documented in Focus in a way that the high level descriptions of derivations are expressed only using program level information. All the meta-level information, together with dependencies between derivations of program components, is automatically recorded by the system at a lower level of description for its own use in replay.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments

DTIC Science & Technology

2015-11-20

1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming

Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Using the self-select paradigm to delineate the nature of speech motor programming.

PubMed

Wright, David L; Robin, Don A; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H; Fox, Peter T

2009-06-01

The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance.
Debugging Fortran on a shared memory machine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Allen, T.R.; Padua, D.A.

1987-01-01

Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
Managing Change in a Research University Special Education Program.

ERIC Educational Resources Information Center

Affleck, James Q.; Lowenbraun, Sheila

1995-01-01

This article describes the restructuring of the Special Education Teacher Education program at the University of Washington. Analysis indicated a paradigm shift as well as intense collaborative activity and significant programmatic and curricular modifications. Descriptions of the type of personnel needed in the new program are offered, and goals…
A New Approach to Accountability: Creating Effective Learning Environments for Programs

ERIC Educational Resources Information Center

Surr, Wendy

2012-01-01

This article describes a new paradigm for accountability that envisions afterschool programs as learning organizations continually engaged in improving quality. Nearly 20 years into the era of results-based accountability, a new generation of afterschool accountability systems is emerging. Rather than aiming to test whether programs have produced…
Career Development in Language Education Programs

ERIC Educational Resources Information Center

Shawer, Saad Fathy; Alkahtani, Saad Ali

2013-01-01

This study assesses the influence of a two-year language program evaluation on program directors and faculty career development. The study makes use of mixed-paradigms (positivism and qualitative interpretive), mixed-strategies (survey research and qualitative evaluation), one-way analysis of variance (ANOVA) and a post-hoc test of multiple…
A portable MPI-based parallel vector template library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE PAGES

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...

2016-06-01

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Parent-Child Parallel-Group Intervention for Childhood Aggression in Hong Kong

ERIC Educational Resources Information Center

Fung, Annis L. C.; Tsang, Sandra H. K. M.

2006-01-01

This article reports the original evidence-based outcome study on parent-child parallel group-designed Anger Coping Training (ACT) program for children aged 8-10 with reactive aggression and their parents in Hong Kong. This research program involved experimental and control groups with pre- and post-comparison. Quantitative data collection…
Parallel computer vision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Uhr, L.

1987-01-01

This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
Parallel Performance of a Combustion Chemistry Simulation

DOE PAGES

Skinner, Gregg; Eigenmann, Rudolf

1995-01-01

We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Algorithms and programming tools for image processing on the MPP, part 2

NASA Technical Reports Server (NTRS)

Reeves, Anthony P.

1986-01-01

A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
Scalable Unix commands for parallel processors : a high-performance implementation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ong, E.; Lusk, E.; Gropp, W.

2001-06-22

We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1989-01-01

A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
A CS1 pedagogical approach to parallel thinking

NASA Astrophysics Data System (ADS)

Rague, Brian William

Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of any initial three-week CS1 level module when compared with student comprehension levels just prior to starting the course. Survey results measured during the ninth week of the course reveal that performance levels remained high compared to pre-course performance scores. A second result produced by this study reveals no statistically significant interaction effect between the intervention method and student performance as measured by the evaluation instrument over three separate testing periods. However, visual inspection of survey score trends and the low p-value generated by the interaction analysis (0.062) indicate that further studies may verify improved concept retention levels for the lecture w/PAT group.
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less

PISCES 2 users manual

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.

1987-01-01

PISCES 2 is a programming environment and set of extensions to Fortran 77 for parallel programming. It is intended to provide a basis for writing programs for scientific and engineering applications on parallel computers in a way that is relatively independent of the particular details of the underlying computer architecture. This user's manual provides a complete description of the PISCES 2 system as it is currently implemented on the 20 processor Flexible FLEX/32 at NASA Langley Research Center.
A language comparison for scientific computing on MIMD architectures

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.

1989-01-01

Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
Code Parallelization with CAPO: A User Manual

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2001-01-01

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Management of Patient Experience With ATX-101 (Deoxycholic Acid Injection) for Reduction of Submental Fat.

PubMed

Dover, Jeffrey S; Kenkel, Jeffrey M; Carruthers, Alastair; Lizzul, Paul F; Gross, Todd M; Subramanian, Meenakshi; Beddingfield, Frederick C

2016-11-01

ATX-101 (deoxycholic acid injection; Kythera Biopharmaceuticals, Inc., Westlake Village, CA [an affiliate of Allergan plc, Dublin, Ireland]) was recently approved for submental fat (SMF) reduction in the United States (Kybella) and Canada (Belkyra). The pivotal trials supporting these approvals revealed that ATX-101 is associated with common injection-site treatment reactions consistent with its mechanism of action and administration procedure. The purpose of this study was to evaluate 4 patient experience management paradigms targeting the common injection-site adverse events of pain, swelling/edema, and bruising after a single treatment session with ATX-101. In this double-blind, parallel-group, exploratory Phase 3b study (ClinicalTrials.gov identifier: NCT02007434), subjects with moderate to severe SMF were randomized 4:1 within each paradigm to receive ATX-101 2 mg/cm or placebo. In Paradigm 1, subjects received a cold pack application to the treatment area. In Paradigm 2, in addition to cold pack application, subjects were treated with topical lidocaine and injectable lidocaine containing epinephrine. In Paradigm 3, in addition to the interventions of Paradigm 2, subjects received loratadine and ibuprofen. Subjects in Paradigm 4 received the same interventions in Paradigm 3, plus application of a chin strap. Eighty-three subjects were treated. In ATX-101-treated subjects, peak pain occurred within 1 to 5 minutes of treatment, with median values at these time points ranging from 21.4 to 35.7 mm on a 100-mm pain visual analog scale ("mild"). Pain ratings reduced substantially by 15 minutes; at 4 hours after injection, pain was characterized as mild tenderness or mild achiness. Compared with cold alone, treatment with topical and injectable lidocaine reduced median peak pain by 17%. Addition of ibuprofen and loratadine resulted in a total reduction in pain by 40%. Peak swelling/edema in ATX-101-treated subjects was "modest," with mean values ≤1.7 (on a 0-5 scale) across all paradigms. Swelling/edema was not substantially mitigated by the interventions, including ibuprofen, loratidine, and the use of a chin strap. Bruising associated with ATX-101 treatment was confined to the treatment area, with mean values between 1.0 and 1.4 on a 0-to-5 scale. Bruising was modestly reduced by injectable lidocaine with epinephrine. Results from this study support the safety of ATX-101 for SMF reduction, and demonstrate that pain and bruising associated with ATX-101 treatment can be mitigated by a series of simple measures.
Shifting the HIV training and research paradigm to address disparities in HIV outcomes

PubMed Central

LEVISON, Julie H.; ALEGRÍA, Margarita

2016-01-01

Tailored programs to diversify the pool of HIV/AIDS investigators and provide sufficient training and support for minority investigators to compete successfully are uncommon in the US and abroad. This paper encourages a shift in the HIV/AIDS training and research paradigm to effectively train and mentor Latino researchers in the US, Latin America and the Caribbean. We suggest three strategies to accomplish this: 1) coaching senior administrative and academic staff of HIV/AIDS training programs on the needs, values, and experiences unique to Latino investigators; 2) encouraging mentors to be receptive to a different set of research questions and approaches that Latino researchers offer due to their life experiences and perspectives; and 3) creating a virtual infrastructure to share resources and tackle challenges faced by minority researchers. Shifts in the research paradigm to include, retain, and promote Latino HIV/AIDS researchers will benefit the scientific process and the patients and communities who await the promise of HIV/AIDS research. PMID:27501811
Parallel Excitation for B-Field Insensitive Fat-Saturation Preparation

PubMed Central

Heilman, Jeremiah A.; Derakhshan, Jamal D.; Riffe, Matthew J.; Gudino, Natalia; Tkach, Jean; Flask, Chris A.; Duerk, Jeffrey L.; Griswold, Mark A.

2016-01-01

Multichannel transmission has the potential to improve many aspects of MRI through a new paradigm in excitation. In this study, multichannel transmission is used to address the effects that variations in B0 homogeneity have on fat-saturation preparation through the use of the frequency, phase, and amplitude degrees of freedom afforded by independent transmission channels. B1 homogeneity is intrinsically included via use of coil sensitivities in calculations. A new method, parallel excitation for B-field insensitive fat-saturation preparation, can achieve fat saturation in 89% of voxels with Mz ≤ 0.1 in the presence of ±4 ppm B0 variation, where traditional CHESS methods achieve only 40% in the same conditions. While there has been much progress to apply multichannel transmission at high field strengths, particular focus is given here to application of these methods at 1.5 T. PMID:22247080
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Are all ostracism experiences equal? A comparison of the autobiographical recall, Cyberball, and O-Cam paradigms.

PubMed

Godwin, Alexandra; MacNevin, Georgia; Zadro, Lisa; Iannuzzelli, Rose; Weston, Stephanie; Gonsalkorale, Karen; Devine, Patricia

2014-09-01

In the present study, we aimed to compare the primary-need depletion elicited by three common ostracism paradigms: autobiographical recall (e.g., Zhong & Leonardelli in Psychological Science 19:838-842, 2008), Cyberball (Williams, Cheung, & Choi in Journal of Personality and Social Psychology 79:748-762, 2000), and O-Cam (Goodacre & Zadro in Behavior Research Methods 42:768-774, 2010). A total of 152 participants (52 males) were randomly allocated to one of the three paradigms, and their subsequent primary needs were measured (belonging, control, self-esteem, and meaningful existence). O-Cam was found to induce greater total primary-need depletion than did Cyberball and recall, which did not differ significantly from each other. Moreover, when examining the pattern of individual need depletion elicited by each paradigm, O-Cam was found to induce significantly greater depletion of belonging, control, and meaningful existence than did the recall paradigm, and significantly greater depletion of control and self-esteem than did Cyberball. No other comparisons were found to be significant, including the comparisons between the recall and Cyberball paradigms for each individual primary need. Collectively, the findings will assist ostracism researchers in making informed choices regarding (a) which paradigm is appropriate to implement with respect to their research aims, and (b) whether the interchangeable use of paradigms within a program of research is appropriate practice.
An investigation of habit learning in Anorexia Nervosa

PubMed Central

Godier, Lauren R; de Wit, Sanne; Pinto, Anthony; Steinglass, Joanna E; Greene, Ashley L; Scaife, Jessica; Gillan, Claire M.; Walsh, B. Timothy; Simpson, Helen-Blair; Park, Rebecca J

2017-01-01

Anorexia Nervosa (AN) is a disorder characterised by compulsive behaviour, such as self-starvation and excessive exercise, which develop in the pursuit of weight-loss. Recent theory suggests that once established, compulsive weight-loss behaviours in AN may become habitual. In two parallel studies, we measured whether individuals with AN showed a bias toward habits using two outcome-devaluation tasks. In Study 1, 23 women with AN (restrictive and binge/purge subtypes), and 18 healthy controls (HC) completed the slips-of-action paradigm, designed to assess reward-based habits. In Study 2, 13 women with restrictive AN, 14 women recovered from restrictive AN, and 17 female HC participants completed the slips-of-action paradigm, and an avoidance paradigm, designed to assess aversive habits. AN participants showed no deficit relative to HCs in the ability to use feedback to respond correctly to stimuli. Following devaluation of outcomes, all groups in both studies were equally able to withhold inappropriate responses, suggesting no deficit in the balance between goal-directed and habitual control of behaviour in these tasks in AN. These results suggest that individuals with AN do not show a generalised tendency to rely on habits in two outcome-devaluation tasks. Future research is needed to investigate the potential role of disorder-specific habits in the maintenance of behaviour in AN. PMID:27497292
Long-Term Training with a Brain-Machine Interface-Based Gait Protocol Induces Partial Neurological Recovery in Paraplegic Patients.

PubMed

Donati, Ana R C; Shokur, Solaiman; Morya, Edgard; Campos, Debora S F; Moioli, Renan C; Gitti, Claudia M; Augusto, Patricia B; Tripodi, Sandra; Pires, Cristhiane G; Pereira, Gislaine A; Brasil, Fabricio L; Gallo, Simone; Lin, Anthony A; Takigami, Angelo K; Aratanha, Maria A; Joshi, Sanjay; Bleuler, Hannes; Cheng, Gordon; Rudolph, Alan; Nicolelis, Miguel A L

2016-08-11

Brain-machine interfaces (BMIs) provide a new assistive strategy aimed at restoring mobility in severely paralyzed patients. Yet, no study in animals or in human subjects has indicated that long-term BMI training could induce any type of clinical recovery. Eight chronic (3-13 years) spinal cord injury (SCI) paraplegics were subjected to long-term training (12 months) with a multi-stage BMI-based gait neurorehabilitation paradigm aimed at restoring locomotion. This paradigm combined intense immersive virtual reality training, enriched visual-tactile feedback, and walking with two EEG-controlled robotic actuators, including a custom-designed lower limb exoskeleton capable of delivering tactile feedback to subjects. Following 12 months of training with this paradigm, all eight patients experienced neurological improvements in somatic sensation (pain localization, fine/crude touch, and proprioceptive sensing) in multiple dermatomes. Patients also regained voluntary motor control in key muscles below the SCI level, as measured by EMGs, resulting in marked improvement in their walking index. As a result, 50% of these patients were upgraded to an incomplete paraplegia classification. Neurological recovery was paralleled by the reemergence of lower limb motor imagery at cortical level. We hypothesize that this unprecedented neurological recovery results from both cortical and spinal cord plasticity triggered by long-term BMI usage.
Long-Term Training with a Brain-Machine Interface-Based Gait Protocol Induces Partial Neurological Recovery in Paraplegic Patients

PubMed Central

Donati, Ana R. C.; Shokur, Solaiman; Morya, Edgard; Campos, Debora S. F.; Moioli, Renan C.; Gitti, Claudia M.; Augusto, Patricia B.; Tripodi, Sandra; Pires, Cristhiane G.; Pereira, Gislaine A.; Brasil, Fabricio L.; Gallo, Simone; Lin, Anthony A.; Takigami, Angelo K.; Aratanha, Maria A.; Joshi, Sanjay; Bleuler, Hannes; Cheng, Gordon; Rudolph, Alan; Nicolelis, Miguel A. L.

2016-01-01

Brain-machine interfaces (BMIs) provide a new assistive strategy aimed at restoring mobility in severely paralyzed patients. Yet, no study in animals or in human subjects has indicated that long-term BMI training could induce any type of clinical recovery. Eight chronic (3–13 years) spinal cord injury (SCI) paraplegics were subjected to long-term training (12 months) with a multi-stage BMI-based gait neurorehabilitation paradigm aimed at restoring locomotion. This paradigm combined intense immersive virtual reality training, enriched visual-tactile feedback, and walking with two EEG-controlled robotic actuators, including a custom-designed lower limb exoskeleton capable of delivering tactile feedback to subjects. Following 12 months of training with this paradigm, all eight patients experienced neurological improvements in somatic sensation (pain localization, fine/crude touch, and proprioceptive sensing) in multiple dermatomes. Patients also regained voluntary motor control in key muscles below the SCI level, as measured by EMGs, resulting in marked improvement in their walking index. As a result, 50% of these patients were upgraded to an incomplete paraplegia classification. Neurological recovery was paralleled by the reemergence of lower limb motor imagery at cortical level. We hypothesize that this unprecedented neurological recovery results from both cortical and spinal cord plasticity triggered by long-term BMI usage. PMID:27513629
Psychophysical testing of spatial and temporal dimensions of endogenous analgesia: conditioned pain modulation and offset analgesia.

PubMed

Honigman, Liat; Yarnitsky, David; Sprecher, Elliot; Weissman-Fogel, Irit

2013-08-01

The endogenous analgesia (EA) system is psychophysically evaluated using various paradigms, including conditioned pain modulation (CPM) and offset analgesia (OA) testing, respectively, the spatial and temporal filtering processes of noxious information. Though both paradigms assess the function of the EA system, it is still unknown whether they reflect the same aspects of EA and consequently whether they provide additive or equivalent data. Twenty-nine healthy volunteers (15 males) underwent 5 trials of different stimulation conditions in random order including: (1) the classic OA three-temperature stimulus train ('OA'); (2) a three-temperature stimulus train as control for the OA ('OAcon'); (3) a constant temperature stimulus ('constant'); (4) the classic parallel CPM ('CPM'); and (5) a combination of OA and CPM ('OA + CPM'). We found that in males, the pain reduction during the OA + CPM condition was greater than during the OA (P = 0.003) and CPM (P = 0.07) conditions. Furthermore, a correlation was found between OA and CPM (r = 0.62, P = 0.01) at the time of maximum OA effect. The additive effect found suggests that the two paradigms represent at least partially different aspects of EA. The moderate association between the CPM and OA magnitudes indicates, on the other hand, some commonality of their underlying mechanisms.
Reframing Resilience: Pilot Evaluation of a Program to Promote Resilience in Marginalized Older Adults

ERIC Educational Resources Information Center

Fullen, Matthew C.; Gorby, Sean R.

2016-01-01

Resilience has been described as a paradigm for aging that is more inclusive than models that focus on physiological and functional abilities. We evaluated a novel program, Resilient Aging, designed to influence marginalized older adults' perceptions of their resilience, self-efficacy, and wellness. The multiweek group program incorporated an…
Interdisciplinary Instruction in the Humanities Enrichment Program: Alignment of Programmatic, Pedagogic, and Learner Goals.

ERIC Educational Resources Information Center

Gudipati, Lakshmi

This paper details the benefits of interdisciplinary studies, with particular focus on the Humanities Enrichment Program at the Community College of Philadelphia. The program uses a team-teaching, linked-course paradigm. Two courses from different disciplines are aligned, and faculty from each discipline teach the linked courses as humanities…
The Impact of Different Teaching Approaches and Languages on Student Learning of Introductory Programming Concepts

ERIC Educational Resources Information Center

Kunkle, Wanda M.

2010-01-01

Many students experience difficulties learning to program. They find learning to program in the object-oriented paradigm particularly challenging. As a result, computing educators have tried a variety of instructional methods to assist beginning programmers. These include developing approaches geared specifically toward novices and experimenting…
Rapid data processing for ultrafast X-ray computed tomography using scalable and modular CUDA based pipelines

NASA Astrophysics Data System (ADS)

Frust, Tobias; Wagner, Michael; Stephan, Jan; Juckeland, Guido; Bieberle, André

2017-10-01

Ultrafast X-ray tomography is an advanced imaging technique for the study of dynamic processes basing on the principles of electron beam scanning. A typical application case for this technique is e.g. the study of multiphase flows, that is, flows of mixtures of substances such as gas-liquidflows in pipelines or chemical reactors. At Helmholtz-Zentrum Dresden-Rossendorf (HZDR) a number of such tomography scanners are operated. Currently, there are two main points limiting their application in some fields. First, after each CT scan sequence the data of the radiation detector must be downloaded from the scanner to a data processing machine. Second, the current data processing is comparably time-consuming compared to the CT scan sequence interval. To enable online observations or use this technique to control actuators in real-time, a modular and scalable data processing tool has been developed, consisting of user-definable stages working independently together in a so called data processing pipeline, that keeps up with the CT scanner's maximal frame rate of up to 8 kHz. The newly developed data processing stages are freely programmable and combinable. In order to achieve the highest processing performance all relevant data processing steps, which are required for a standard slice image reconstruction, were individually implemented in separate stages using Graphics Processing Units (GPUs) and NVIDIA's CUDA programming language. Data processing performance tests on different high-end GPUs (Tesla K20c, GeForce GTX 1080, Tesla P100) showed excellent performance. Program Files doi:http://dx.doi.org/10.17632/65sx747rvm.1 Licensing provisions: LGPLv3 Programming language: C++/CUDA Supplementary material: Test data set, used for the performance analysis. Nature of problem: Ultrafast computed tomography is performed with a scan rate of up to 8 kHz. To obtain cross-sectional images from projection data computer-based image reconstruction algorithms must be applied. The objective of the presented program is to reconstruct a data stream of around 1.3 GB s-1 in a minimum time period. Thus, the program allows to go into new fields of application and to use in the future even more compute-intensive algorithms, especially for data post-processing, to improve the quality of data analysis. Solution method: The program solves the given problem using a two-step process: first, by a generic, expandable and widely applicable template library implementing the streaming paradigm (GLADOS); second, by optimized processing stages for ultrafast computed tomography implementing the required algorithms in a performance-oriented way using CUDA (RISA). Thereby, task-parallelism between the processing stages as well as data parallelism within one processing stage is realized.
PIPS-SBB: A Parallel Distributed-Memory Branch-and-Bound Algorithm for Stochastic Mixed-Integer Programs

DOE PAGES

Munguia, Lluis-Miquel; Oxberry, Geoffrey; Rajan, Deepak

2016-05-01

Stochastic mixed-integer programs (SMIPs) deal with optimization under uncertainty at many levels of the decision-making process. When solved as extensive formulation mixed- integer programs, problem instances can exceed available memory on a single workstation. In order to overcome this limitation, we present PIPS-SBB: a distributed-memory parallel stochastic MIP solver that takes advantage of parallelism at multiple levels of the optimization process. We also show promising results on the SIPLIB benchmark by combining methods known for accelerating Branch and Bound (B&B) methods with new ideas that leverage the structure of SMIPs. Finally, we expect the performance of PIPS-SBB to improve furthermore » as more functionality is added in the future.« less
On the utility of threads for data parallel programming

NASA Technical Reports Server (NTRS)

Fahringer, Thomas; Haines, Matthew; Mehrotra, Piyush

1995-01-01

Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.
Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

NASA Technical Reports Server (NTRS)

Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

2005-01-01

The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.
A design methodology for portable software on parallel computers

NASA Technical Reports Server (NTRS)

Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.

1993-01-01

This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven.

Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

PubMed

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Dean N.; Silva, Claudio

2013-09-30

For the past three years, a large analysis and visualization effort—funded by the Department of Energy’s Office of Biological and Environmental Research (BER), the National Aeronautics and Space Administration (NASA), and the National Oceanic and Atmospheric Administration (NOAA)—has brought together a wide variety of industry-standard scientific computing libraries and applications to create Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) to serve the global climate simulation and observational research communities. To support interactive analysis and visualization, all components connect through a provenance application–programming interface to capture meaningful history and workflow. Components can be loosely coupled into the framework for fast integrationmore » or tightly coupled for greater system functionality and communication with other components. The overarching goal of UV-CDAT is to provide a new paradigm for access to and analysis of massive, distributed scientific data collections by leveraging distributed data architectures located throughout the world. The UV-CDAT framework addresses challenges in analysis and visualization and incorporates new opportunities, including parallelism for better efficiency, higher speed, and more accurate scientific inferences. Today, it provides more than 600 users access to more analysis and visualization products than any other single source.« less
Collateral damage control in cancer therapy: defining the stem identity in gliomas.

PubMed

Hsieh, David

2011-01-01

The discovery of discrete functional components in cancer systems advocates a paradigm shift in therapeutic design towards the targeted destruction of critical cellular constituents that fuel tumorigenic potential. In astrocytomas, malignant growth can be propagated and sustained by glioma stem cells (GSCs) endowed with highly efficient clonogenic and tumor initiation capacities. Given their disproportionate oncogenic contribution, GSCs are often considered the optimal targets for curative treatment because their eradication may subvert the refractory nature of GBMs. However, the close affinity of GSCs and normal neural stem cells (NSCs) is a cautionary note for off-target effects of GSC-based therapies. In fact, many parallels can be drawn between GSC and NSC functions, which ostensibly rely on a communal collection of stem cell-promoting transcription factors (TFs). Only through rigorous scrutiny of nuances in the stemness program of GSCs and NSCs may we clarify the pathogenic mechanisms of stemness factors and reveal processes exploited by cancer cells to co-opt stem cell traits. Importantly, discerning the specific requirements for GSC and NSC maintenance may be an essential requisite when assessing molecular targets for discriminatory targeting of GSCs with minimal sequelae.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Badal, Andreu; Badano, Aldo

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-raymore » imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.« less
GRADSPMHD: A parallel MHD code based on the SPH formalism

NASA Astrophysics Data System (ADS)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
A neural-network approach to robotic control

NASA Technical Reports Server (NTRS)

Graham, D. P. W.; Deleuterio, G. M. T.

1993-01-01

An artificial neural-network paradigm for the control of robotic systems is presented. The approach is based on the Cerebellar Model Articulation Controller created by James Albus and incorporates several extensions. First, recognizing the essential structure of multibody equations of motion, two parallel modules are used that directly reflect the dynamical characteristics of multibody systems. Second, the architecture of the proposed network is imbued with a self-organizational capability which improves efficiency and accuracy. Also, the networks can be arranged in hierarchical fashion with each subsequent network providing finer and finer resolution.
Searching for an Axis-Parallel Shoreline

NASA Astrophysics Data System (ADS)

Langetepe, Elmar

We are searching for an unknown horizontal or vertical line in the plane under the competitive framework. We design a framework for lower bounds on all cyclic and monotone strategies that result in two-sequence functionals. For optimizing such functionals we apply a method that combines two main paradigms. The given solution shows that the combination method is of general interest. Finally, we obtain the current best strategy and can prove that this is the best strategy among all cyclic and monotone strategies which is a main step toward a lower bound construction.
Low Temperature Performance of High-Speed Neural Network Circuits

NASA Technical Reports Server (NTRS)

Duong, T.; Tran, M.; Daud, T.; Thakoor, A.

1995-01-01

Artificial neural networks, derived from their biological counterparts, offer a new and enabling computing paradigm specially suitable for such tasks as image and signal processing with feature classification/object recognition, global optimization, and adaptive control. When implemented in fully parallel electronic hardware, it offers orders of magnitude speed advantage. Basic building blocks of the new architecture are the processing elements called neurons implemented as nonlinear operational amplifiers with sigmoidal transfer function, interconnected through weighted connections called synapses implemented using circuitry for weight storage and multiply functions either in an analog, digital, or hybrid scheme.
Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments

NASA Astrophysics Data System (ADS)

Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.

Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.
Parallel Logic Programming and Parallel Systems Software and Hardware

DTIC Science & Technology

1989-07-29

Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
The force on the flex: Global parallelism and portability

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1986-01-01

A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Efficient Thread Labeling for Monitoring Programs with Nested Parallelism

NASA Astrophysics Data System (ADS)

Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee

It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.
Cognitive characteristics of learning Java, an object-oriented programming language

NASA Astrophysics Data System (ADS)

White, Garry Lynn

Industry and Academia are moving from procedural programming languages (e.g., COBOL) to object-oriented programming languages, such as Java for the Internet. Past studies in the cognitive aspects of programming have focused primarily on procedural programming languages. Some of the languages used have been Pascal, C, Basic, FORTAN, and COBOL. Object-oriented programming (OOP) represents a new paradigm for computing. Industry is finding that programmers are having difficulty shifting to this new programming paradigm. This instruction in OOP is currently starting in colleges and universities across the country. What are the cognitive aspects for this new OOP language Java? When is a student developmentally ready to handle the cognitive characteristics of the OOP language Java? Which cognitive teaching style is best for this OOP language Java? Questions such as the aforementioned are the focus of this research Such research is needed to improve understanding of the learning process and identify students' difficulties with OOP methods. This can enhance academic teaching and industry training (Scholtz, 1993; Sheetz, 1997; Rosson, 1990). Cognitive development as measured by the Propositional Logic Test, cognitive style as measured by the Hemispheric Mode Indicator, and physical hemispheric dominance as measured by a self-report survey were obtained from thirty-six university students studying Java programming. Findings reveal that physical hemispheric dominance is unrelated to cognitive and programming language variables. However, both procedural and object oriented programming require Piaget's formal operation cognitive level as indicated by the Propositional Logic Test. This is consistent with prior research A new finding is that object oriented programming also requires formal operation cognitive level. Another new finding is that object oriented programming appears to be unrelated to hemispheric cognitive style as indicated by the Hemispheric Mode Indicator (HMI). This research suggests that object oriented programming is hemispheric thinking style friendly, while procedural programming is left hemispheric cognitive style. The conclusion is that cognitive characteristics are not the cause for the difficulty in shifting from procedural to this new programming paradigm of object oriented programming. An alternative possibility to the difficulty is proactive interference. Prior learning of procedural programming makes it harder to learning object oriented programming. Further research is needed to determine if proactive interference is the cause for the difficulty in shifting from procedural programming to object oriented programming.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Parallel Rendering of Large Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Garbutt, Alexander E.

2005-01-01

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Solving Partial Differential Equations in a data-driven multiprocessor environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.

1988-12-31

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

PubMed

Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

2016-01-01

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Visual analysis of inter-process communication for large-scale parallel computing.

PubMed

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Using the Natural Language Paradigm (NLP) to increase vocalizations of older adults with cognitive impairments.

PubMed

Leblanc, Linda A; Geiger, Kaneen B; Sautter, Rachael A; Sidener, Tina M

2007-01-01

The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated in a multiple baseline design across participants. Data were collected on appropriate and inappropriate vocalizations with appropriate vocalizations coded as prompted or unprompted during baseline and treatment sessions. All participants experienced increases in appropriate speech during NLP with variable response patterns. Additionally, the two participants with substantial inappropriate vocalizations showed decreases in inappropriate speech. Implications for intervention in day programs are discussed.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.