Sample records for portable parallel programming

  1. The FORCE - A highly portable parallel programming language

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  2. The FORCE: A highly portable parallel programming language

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  3. Application Portable Parallel Library

    NASA Technical Reports Server (NTRS)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  4. Optics Program Modified for Multithreaded Parallel Computing

    NASA Technical Reports Server (NTRS)

    Lou, John; Bedding, Dave; Basinger, Scott

    2006-01-01

    A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

  5. F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).

  6. A design methodology for portable software on parallel computers

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.

    1993-01-01

    This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven.

  7. Portable parallel portfolio optimization in the Aurora Financial Management System

    NASA Astrophysics Data System (ADS)

    Laure, Erwin; Moritsch, Hans

    2001-07-01

    Financial planning problems are formulated as large scale, stochastic, multiperiod, tree structured optimization problems. An efficient technique for solving this kind of problems is the nested Benders decomposition method. In this paper we present a parallel, portable, asynchronous implementation of this technique. To achieve our portability goals we elected the programming language Java for our implementation and used a high level Java based framework, called OpusJava, for expressing the parallelism potential as well as synchronization constraints. Our implementation is embedded within a modular decision support tool for portfolio and asset liability management, the Aurora Financial Management System.

  8. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    NASA Technical Reports Server (NTRS)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  9. A real-time MPEG software decoder using a portable message-passing library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

    1995-12-31

    We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.

  10. Portable multi-node LQCD Monte Carlo simulations using OpenACC

    NASA Astrophysics Data System (ADS)

    Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele

    This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.

  11. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  12. ProperCAD: A portable object-oriented parallel environment for VLSI CAD

    NASA Technical Reports Server (NTRS)

    Ramkumar, Balkrishna; Banerjee, Prithviraj

    1993-01-01

    Most parallel algorithms for VLSI CAD proposed to date have one important drawback: they work efficiently only on machines that they were designed for. As a result, algorithms designed to date are dependent on the architecture for which they are developed and do not port easily to other parallel architectures. A new project under way to address this problem is described. A Portable object-oriented parallel environment for CAD algorithms (ProperCAD) is being developed. The objectives of this research are (1) to develop new parallel algorithms that run in a portable object-oriented environment (CAD algorithms using a general purpose platform for portable parallel programming called CARM is being developed and a C++ environment that is truly object-oriented and specialized for CAD applications is also being developed); and (2) to design the parallel algorithms around a good sequential algorithm with a well-defined parallel-sequential interface (permitting the parallel algorithm to benefit from future developments in sequential algorithms). One CAD application that has been implemented as part of the ProperCAD project, flat VLSI circuit extraction, is described. The algorithm, its implementation, and its performance on a range of parallel machines are discussed in detail. It currently runs on an Encore Multimax, a Sequent Symmetry, Intel iPSC/2 and i860 hypercubes, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. Performance data for other applications that were developed are provided: namely test pattern generation for sequential circuits, parallel logic synthesis, and standard cell placement.

  13. Portable parallel stochastic optimization for the design of aeropropulsion components

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Rhodes, G. S.

    1994-01-01

    This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shipman, Galen M.

    These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less

  15. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed Central

    Nadkarni, P. M.; Miller, P. L.

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632

  16. A portable MPI-based parallel vector template library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  17. A Portable MPI-Based Parallel Vector Template Library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  18. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  19. Parallel and Portable Monte Carlo Particle Transport

    NASA Astrophysics Data System (ADS)

    Lee, S. R.; Cummings, J. C.; Nolen, S. D.; Keen, N. D.

    1997-08-01

    We have developed a multi-group, Monte Carlo neutron transport code in C++ using object-oriented methods and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and α eigenvalues of the neutron transport equation on a rectilinear computational mesh. It is portable to and runs in parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities are discussed, along with physics and performance results for several test problems on a variety of hardware, including all three Accelerated Strategic Computing Initiative (ASCI) platforms. Current parallel performance indicates the ability to compute α-eigenvalues in seconds or minutes rather than days or weeks. Current and future work on the implementation of a general transport physics framework (TPF) is also described. This TPF employs modern C++ programming techniques to provide simplified user interfaces, generic STL-style programming, and compile-time performance optimization. Physics capabilities of the TPF will be extended to include continuous energy treatments, implicit Monte Carlo algorithms, and a variety of convergence acceleration techniques such as importance combing.

  20. Developing Information Power Grid Based Algorithms and Software

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack

    1998-01-01

    This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.

  1. Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    G.A. Pope; K. Sephernoori; D.C. McKinney

    1996-03-15

    This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harms, Kevin; Oral, H. Sarp; Atchley, Scott

    The Oak Ridge and Argonne Leadership Computing Facilities are both receiving new systems under the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) program. Because they are both part of the INCITE program, applications need to be portable between these two facilities. However, the Summit and Aurora systems will be vastly different architectures, including their I/O subsystems. While both systems will have POSIX-compliant parallel file systems, their Burst Buffer technologies will be different. This difference may pose challenges to application portability between facilities. Application developers need to pay attention to specific burst buffer implementations to maximize code portability.

  3. Selective, Embedded, Just-In-Time Specialization (SEJITS): Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages

    DTIC Science & Technology

    2012-12-01

    identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency

  4. ASC-ATDM Performance Portability Requirements for 2015-2019

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Edwards, Harold C.; Trott, Christian Robert

    This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallelmore » - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.« less

  5. Users manual for the Chameleon parallel programming tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gropp, W.; Smith, B.

    1993-06-01

    Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highlymore » portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.« less

  6. Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.

  7. The Castle Project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tom Anderson; David Culler; James Demmel

    2000-02-16

    The goal of the Castle project was to provide a parallel programming environment that enables the construction of high performance applications that run portably across many platforms. The authors approach was to design and implement a multilayered architecture, with higher levels building on lower ones to ensure portability, but with care taken not to introduce abstractions that sacrifice performance.

  8. Bilingual parallel programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less

  9. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  10. An interactive parallel programming environment applied in atmospheric science

    NASA Technical Reports Server (NTRS)

    vonLaszewski, G.

    1996-01-01

    This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.

  11. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

    DOE PAGES

    Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel

    2014-07-22

    The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less

  12. MPF: A portable message passing facility for shared memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

    1987-01-01

    The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.

  13. Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

    1999-01-01

    The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.

  14. Parallel, Distributed Scripting with Python

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less

  15. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  16. Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

    PubMed

    Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

    2016-01-01

    Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.

  17. A C++ Thread Package for Concurrent and Parallel Programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jie Chen; William Watson

    1999-11-01

    Recently thread libraries have become a common entity on various operating systems such as Unix, Windows NT and VxWorks. Those thread libraries offer significant performance enhancement by allowing applications to use multiple threads running either concurrently or in parallel on multiprocessors. However, the incompatibilities between native libraries introduces challenges for those who wish to develop portable applications.

  18. Design and optimization of a portable LQCD Monte Carlo code using OpenACC

    NASA Astrophysics Data System (ADS)

    Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele

    The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.

  19. New NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

    1997-01-01

    NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

  20. The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

    NASA Technical Reports Server (NTRS)

    Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.

  1. Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

    NASA Technical Reports Server (NTRS)

    Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)

    2002-01-01

    A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.

  2. Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

    NASA Technical Reports Server (NTRS)

    Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)

    2001-01-01

    A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.

  3. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  4. MPIRUN: A Portable Loader for Multidisciplinary and Multi-Zonal Applications

    NASA Technical Reports Server (NTRS)

    Fineberg, Samuel A.; Woodrow, Thomas S. (Technical Monitor)

    1994-01-01

    Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. One method for implementing the message passing portion of these codes is with the new Message Passing Interface (MPI) standard. Unfortunately, this standard only specifies the message passing portion of an application, but does not specify any portable mechanisms for loading an application. MPIRUN was developed to provide a portable means for loading MPI programs, and was specifically targeted at multidisciplinary and multi-zonal applications. Programs using MPIRUN for loading and MPI for message passing are then portable between all machines supported by MPIRUN. MPIRUN is currently implemented for the Intel iPSC/860, TMC CM5, IBM SP-1 and SP-2, Intel Paragon, and workstation clusters. Further, MPIRUN is designed to be simple enough to port easily to any system supporting MPI.

  5. MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

    NASA Astrophysics Data System (ADS)

    Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.

    1995-03-01

    PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.

  6. MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simunovic, S.; Zacharia, T.; Baltas, N.

    1995-04-01

    PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less

  7. The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete; hide

    1998-01-01

    Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

  8. Testing New Programming Paradigms with NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.

  9. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  10. A high-speed linear algebra library with automatic parallelism

    NASA Technical Reports Server (NTRS)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  11. Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

    PubMed Central

    Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

    2014-01-01

    Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950

  12. Manycore Performance-Portability: Kokkos Multidimensional Array Library

    DOE PAGES

    Edwards, H. Carter; Sunderland, Daniel; Porter, Vicki; ...

    2012-01-01

    Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel executionmore » performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].« less

  13. Implementing Multidisciplinary and Multi-Zonal Applications Using MPI

    NASA Technical Reports Server (NTRS)

    Fineberg, Samuel A.

    1995-01-01

    Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. Unfortunately, simple message passing models, like Intel's NX library, only allow point-to-point and global communication within a single system-defined partition. This makes implementation of these applications quite difficult, if not impossible. In this report it is shown that the new Message Passing Interface (MPI) standard is a viable portable library for implementing the message passing portion of multidisciplinary applications. Further, with the extension of a portable loader, fully portable multidisciplinary application programs can be developed. Finally, the performance of MPI is compared to that of some native message passing libraries. This comparison shows that MPI can be implemented to deliver performance commensurate with native message libraries.

  14. Algorithms and programming tools for image processing on the MPP, part 2

    NASA Technical Reports Server (NTRS)

    Reeves, Anthony P.

    1986-01-01

    A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.

  15. PISCES: An environment for parallel scientific computation

    NASA Technical Reports Server (NTRS)

    Pratt, T. W.

    1985-01-01

    The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.

  16. A survey of packages for large linear systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Milne, Brent

    2000-02-11

    This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less

  17. Portability and Cross-Platform Performance of an MPI-Based Parallel Polygon Renderer

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1999-01-01

    Visualizing the results of computations performed on large-scale parallel computers is a challenging problem, due to the size of the datasets involved. One approach is to perform the visualization and graphics operations in place, exploiting the available parallelism to obtain the necessary rendering performance. Over the past several years, we have been developing algorithms and software to support visualization applications on NASA's parallel supercomputers. Our results have been incorporated into a parallel polygon rendering system called PGL. PGL was initially developed on tightly-coupled distributed-memory message-passing systems, including Intel's iPSC/860 and Paragon, and IBM's SP2. Over the past year, we have ported it to a variety of additional platforms, including the HP Exemplar, SGI Origin2OOO, Cray T3E, and clusters of Sun workstations. In implementing PGL, we have had two primary goals: cross-platform portability and high performance. Portability is important because (1) our manpower resources are limited, making it difficult to develop and maintain multiple versions of the code, and (2) NASA's complement of parallel computing platforms is diverse and subject to frequent change. Performance is important in delivering adequate rendering rates for complex scenes and ensuring that parallel computing resources are used effectively. Unfortunately, these two goals are often at odds. In this paper we report on our experiences with portability and performance of the PGL polygon renderer across a range of parallel computing platforms.

  18. Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

    PubMed

    Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

    2014-09-09

    A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.

  19. A portable approach for PIC on emerging architectures

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor

    2016-03-01

    A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.

  20. Time-domain seismic modeling in viscoelastic media for full waveform inversion on heterogeneous computing platforms with OpenCL

    NASA Astrophysics Data System (ADS)

    Fabien-Ouellet, Gabriel; Gloaguen, Erwan; Giroux, Bernard

    2017-03-01

    Full Waveform Inversion (FWI) aims at recovering the elastic parameters of the Earth by matching recordings of the ground motion with the direct solution of the wave equation. Modeling the wave propagation for realistic scenarios is computationally intensive, which limits the applicability of FWI. The current hardware evolution brings increasing parallel computing power that can speed up the computations in FWI. However, to take advantage of the diversity of parallel architectures presently available, new programming approaches are required. In this work, we explore the use of OpenCL to develop a portable code that can take advantage of the many parallel processor architectures now available. We present a program called SeisCL for 2D and 3D viscoelastic FWI in the time domain. The code computes the forward and adjoint wavefields using finite-difference and outputs the gradient of the misfit function given by the adjoint state method. To demonstrate the code portability on different architectures, the performance of SeisCL is tested on three different devices: Intel CPUs, NVidia GPUs and Intel Xeon PHI. Results show that the use of GPUs with OpenCL can speed up the computations by nearly two orders of magnitudes over a single threaded application on the CPU. Although OpenCL allows code portability, we show that some device-specific optimization is still required to get the best performance out of a specific architecture. Using OpenCL in conjunction with MPI allows the domain decomposition of large models on several devices located on different nodes of a cluster. For large enough models, the speedup of the domain decomposition varies quasi-linearly with the number of devices. Finally, we investigate two different approaches to compute the gradient by the adjoint state method and show the significant advantages of using OpenCL for FWI.

  1. The force on the flex: Global parallelism and portability

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1986-01-01

    A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.

  2. Portable LQCD Monte Carlo code using OpenACC

    NASA Astrophysics Data System (ADS)

    Bonati, Claudio; Calore, Enrico; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Fabio Schifano, Sebastiano; Silvi, Giorgio; Tripiccione, Raffaele

    2018-03-01

    Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.

  3. Automatic Parallelization of Numerical Python Applications using the Global Arrays Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daily, Jeffrey A.; Lewis, Robert R.

    2011-11-30

    Global Arrays is a software system from Pacific Northwest National Laboratory that enables an efficient, portable, and parallel shared-memory programming interface to manipulate distributed dense arrays. The NumPy module is the de facto standard for numerical calculation in the Python programming language, a language whose use is growing rapidly in the scientific and engineering communities. NumPy provides a powerful N-dimensional array class as well as other scientific computing capabilities. However, like the majority of the core Python modules, NumPy is inherently serial. Using a combination of Global Arrays and NumPy, we have reimplemented NumPy as a distributed drop-in replacement calledmore » Global Arrays in NumPy (GAiN). Serial NumPy applications can become parallel, scalable GAiN applications with only minor source code changes. Scalability studies of several different GAiN applications will be presented showing the utility of developing serial NumPy codes which can later run on more capable clusters or supercomputers.« less

  4. A survey of program slicing for software engineering

    NASA Technical Reports Server (NTRS)

    Beck, Jon

    1993-01-01

    This research concerns program slicing which is used as a tool for program maintainence of software systems. Program slicing decreases the level of effort required to understand and maintain complex software systems. It was first designed as a debugging aid, but it has since been generalized into various tools and extended to include program comprehension, module cohesion estimation, requirements verification, dead code elimination, and maintainence of several software systems, including reverse engineering, parallelization, portability, and reuse component generation. This paper seeks to address and define terminology, theoretical concepts, program representation, different program graphs, developments in static slicing, dynamic slicing, and semantics and mathematical models. Applications for conventional slicing are presented, along with a prognosis of future work in this field.

  5. Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

    NASA Astrophysics Data System (ADS)

    Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

    2018-01-01

    This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

  6. Evaluation of a parallel implementation of the learning portion of the backward error propagation neural network: experiments in artifact identification.

    PubMed Central

    Sittig, D. F.; Orr, J. A.

    1991-01-01

    Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607

  7. Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies

    PubMed Central

    Ma, Li; Runesha, H Birali; Dvorkin, Daniel; Garbe, John R; Da, Yang

    2008-01-01

    Background Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS. Results The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements. Conclusion The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware. PMID:18644146

  8. Open SHMEM Reference Implementation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pritchard, Howard; Curtis, Anthony; Welch, Aaron

    2016-05-12

    OpenSHMEM is an effort to create a specification for a standardized API for parallel programming in the Partitioned Global Address Space. Along with the specification the project is also creating a reference implementation of the API. This implementation attempts to be portable, to allow it to be deployed in multiple environments, and to be a starting point for implementations targeted to particular hardware platforms. It will also serve as a springboard for future development of the API.

  9. MPI-IO: A Parallel File I/O Interface for MPI Version 0.3

    NASA Technical Reports Server (NTRS)

    Corbett, Peter; Feitelson, Dror; Hsu, Yarsun; Prost, Jean-Pierre; Snir, Marc; Fineberg, Sam; Nitzberg, Bill; Traversat, Bernard; Wong, Parkson

    1995-01-01

    Thanks to MPI [9], writing portable message passing parallel programs is almost a reality. One of the remaining problems is file I/0. Although parallel file systems support similar interfaces, the lack of a standard makes developing a truly portable program impossible. Further, the closest thing to a standard, the UNIX file interface, is ill-suited to parallel computing. Working together, IBM Research and NASA Ames have drafted MPI-I0, a proposal to address the portable parallel I/0 problem. In a nutshell, this proposal is based on the idea that I/0 can be modeled as message passing: writing to a file is like sending a message, and reading from a file is like receiving a message. MPI-IO intends to leverage the relatively wide acceptance of the MPI interface in order to create a similar I/0 interface. The above approach can be materialized in different ways. The current proposal represents the result of extensive discussions (and arguments), but is by no means finished. Many changes can be expected as additional participants join the effort to define an interface for portable I/0. This document is organized as follows. The remainder of this section includes a discussion of some issues that have shaped the style of the interface. Section 2 presents an overview of MPI-IO as it is currently defined. It specifies what the interface currently supports and states what would need to be added to the current proposal to make the interface more complete and robust. The next seven sections contain the interface definition itself. Section 3 presents definitions and conventions. Section 4 contains functions for file control, most notably open. Section 5 includes functions for independent I/O, both blocking and nonblocking. Section 6 includes functions for collective I/O, both blocking and nonblocking. Section 7 presents functions to support system-maintained file pointers, and shared file pointers. Section 8 presents constructors that can be used to define useful filetypes (the role of filetypes is explained in Section 2 below). Section 9 presents how the error handling mechanism of MPI is supported by the MPI-IO interface. All this is followed by a set of appendices, which contain information about issues that have not been totally resolved yet, and about design considerations. The reader can find there the motivation behind some of our design choices. More information on this would definitely be welcome and will be included in a further release of this document. The first appendix contains a description of MPI-I0's 'hints' structure which is used when opening a file. Appendix B is a discussion of various issues in the support for file pointers. Appendix C explains what we mean in talking about atomic access. Appendix D provides detailed examples of filetype constructors, and Appendix E contains a collection of arguments for and against various design decisions.

  10. Targeting multiple heterogeneous hardware platforms with OpenCL

    NASA Astrophysics Data System (ADS)

    Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.

    2014-06-01

    The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.

  11. Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools

    NASA Technical Reports Server (NTRS)

    Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.

  12. A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

    NASA Astrophysics Data System (ADS)

    Hariri, F.; Tran, T. M.; Jocksch, A.; Lanti, E.; Progsch, J.; Messmer, P.; Brunner, S.; Gheller, C.; Villard, L.

    2016-10-01

    We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandy bridge 8-core CPU by a factor of 3.4.

  13. Toward Enhancing OpenMP's Work-Sharing Directives

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, B M; Huang, L; Jin, H

    2006-05-17

    OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.

  14. Parallel community climate model: Description and user`s guide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less

  15. Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver

    NASA Astrophysics Data System (ADS)

    Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre

    2014-06-01

    This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.

  16. Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

    NASA Astrophysics Data System (ADS)

    Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

    2018-01-01

    We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.

  17. Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

    NASA Astrophysics Data System (ADS)

    Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

    1997-12-01

    Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.

  18. Maximal clique enumeration with data-parallel primitives

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lessley, Brenton; Perciano, Talita; Mathai, Manish

    The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less

  19. Particle Laden Turbulence in a Radiation Environment Using a Portable High Preformace Solver Based on the Legion Runtime System

    NASA Astrophysics Data System (ADS)

    Torres, Hilario; Iaccarino, Gianluca

    2017-11-01

    Soleil-X is a multi-physics solver being developed at Stanford University as a part of the Predictive Science Academic Alliance Program II. Our goal is to conduct high fidelity simulations of particle laden turbulent flows in a radiation environment for solar energy receiver applications as well as to demonstrate our readiness to effectively utilize next generation Exascale machines. The novel aspect of Soleil-X is that it is built upon the Legion runtime system to enable easy portability to different parallel distributed heterogeneous architectures while also being written entirely in high-level/high-productivity languages (Ebb and Regent). An overview of the Soleil-X software architecture will be given. Results from coupled fluid flow, Lagrangian point particle tracking, and thermal radiation simulations will be presented. Performance diagnostic tools and metrics corresponding the the same cases will also be discussed. US Department of Energy, National Nuclear Security Administration.

  20. Performance Analysis and Portability of the PLUM Load Balancing System

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

    1998-01-01

    The ability to dynamically adapt an unstructured mesh is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult. To address this problem, we have developed PLUM, an automatic portable framework for performing adaptive numerical computations in a message-passing environment. PLUM requires that all data be globally redistributed after each mesh adaption to achieve load balance. We present an algorithm for minimizing this remapping overhead by guaranteeing an optimal processor reassignment. We also show that the data redistribution cost can be significantly reduced by applying our heuristic processor reassignment algorithm to the default mapping of the parallel partitioner. Portability is examined by comparing performance on a SP2, an Origin2000, and a T3E. Results show that PLUM can be successfully ported to different platforms without any code modifications.

  1. Simple and powerful visual stimulus generator.

    PubMed

    Kremlácek, J; Kuba, M; Kubová, Z; Vít, F

    1999-02-01

    We describe a cheap, simple, portable and efficient approach to visual stimulation for neurophysiology which does not need any special hardware equipment. The method based on an animation technique uses the FLI autodesk animator format. This form of the animation is replayed by a special program ('player') providing synchronisation pulses toward recording system via parallel port. The 'player is running on an IBM compatible personal computer under MS-DOS operation system and stimulus is displayed on a VGA computer monitor. Various stimuli created with this technique for visual evoked potentials (VEPs) are presented.

  2. On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

    PubMed Central

    Calcagno, Cristina; Coppo, Mario

    2014-01-01

    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327

  3. On designing multicore-aware simulators for systems biology endowed with OnLine statistics.

    PubMed

    Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo

    2014-01-01

    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.

  4. Roofline model toolkit: A practical tool for architectural and program analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

    We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less

  5. Praxis language reference manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Walker, J.H.

    1981-01-01

    This document is a language reference manual for the programming language Praxis. The document contains the specifications that must be met by any compiler for the language. The Praxis language was designed for systems programming in real-time process applications. Goals for the language and its implementations are: (1) highly efficient code generated by the compiler; (2) program portability; (3) completeness, that is, all programming requirements can be met by the language without needing an assembler; and (4) separate compilation to aid in design and management of large systems. The language does not provide any facilities for input/output, stack and queuemore » handling, string operations, parallel processing, or coroutine processing. These features can be implemented as routines in the language, using machine-dependent code to take advantage of facilities in the control environment on different machines.« less

  6. Implementation of Helioseismic Data Reduction and Diagnostic Techniques on Massively Parallel Architectures

    NASA Technical Reports Server (NTRS)

    Korzennik, Sylvain

    1997-01-01

    Under the direction of Dr. Rhodes, and the technical supervision of Dr. Korzennik, the data assimilation of high spatial resolution solar dopplergrams has been carried out throughout the program on the Intel Delta Touchstone supercomputer. With the help of a research assistant, partially supported by this grant, and under the supervision of Dr. Korzennik, code development was carried out at SAO, using various available resources. To ensure cross-platform portability, PVM was selected as the message passing library. A parallel implementation of power spectra computation for helioseismology data reduction, using PVM was successfully completed. It was successfully ported to SMP architectures (i.e. SUN), and to some MPP architectures (i.e. the CM5). Due to limitation of the implementation of PVM on the Cray T3D, the port to that architecture was not completed at the time.

  7. Parallelization of a Monte Carlo particle transport simulation code

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  8. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195

  9. A Parallel Relational Database Management System Approach to Relevance Feedback in Information Retrieval.

    ERIC Educational Resources Information Center

    Lundquist, Carol; Frieder, Ophir; Holmes, David O.; Grossman, David

    1999-01-01

    Describes a scalable, parallel, relational database-drive information retrieval engine. To support portability across a wide range of execution environments, all algorithms adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, accuracy is enhanced over prior database-driven information retrieval efforts. Presents…

  10. Development of a Portable 3CCD Camera System for Multispectral Imaging of Biological Samples

    PubMed Central

    Lee, Hoyoung; Park, Soo Hyun; Noh, Sang Ha; Lim, Jongguk; Kim, Moon S.

    2014-01-01

    Recent studies have suggested the need for imaging devices capable of multispectral imaging beyond the visible region, to allow for quality and safety evaluations of agricultural commodities. Conventional multispectral imaging devices lack flexibility in spectral waveband selectivity for such applications. In this paper, a recently developed portable 3CCD camera with significant improvements over existing imaging devices is presented. A beam-splitter prism assembly for 3CCD was designed to accommodate three interference filters that can be easily changed for application-specific multispectral waveband selection in the 400 to 1000 nm region. We also designed and integrated electronic components on printed circuit boards with firmware programming, enabling parallel processing, synchronization, and independent control of the three CCD sensors, to ensure the transfer of data without significant delay or data loss due to buffering. The system can stream 30 frames (3-waveband images in each frame) per second. The potential utility of the 3CCD camera system was demonstrated in the laboratory for detecting defect spots on apples. PMID:25350510

  11. Communication Studies of DMP and SMP Machines

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.

  12. WaveJava: Wavelet-based network computing

    NASA Astrophysics Data System (ADS)

    Ma, Kun; Jiao, Licheng; Shi, Zhuoer

    1997-04-01

    Wavelet is a powerful theory, but its successful application still needs suitable programming tools. Java is a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multi- threaded, dynamic language. This paper addresses the design and development of a cross-platform software environment for experimenting and applying wavelet theory. WaveJava, a wavelet class library designed by the object-orient programming, is developed to take advantage of the wavelets features, such as multi-resolution analysis and parallel processing in the networking computing. A new application architecture is designed for the net-wide distributed client-server environment. The data are transmitted with multi-resolution packets. At the distributed sites around the net, these data packets are done the matching or recognition processing in parallel. The results are fed back to determine the next operation. So, the more robust results can be arrived quickly. The WaveJava is easy to use and expand for special application. This paper gives a solution for the distributed fingerprint information processing system. It also fits for some other net-base multimedia information processing, such as network library, remote teaching and filmless picture archiving and communications.

  13. A practical approach to portability and performance problems on massively parallel supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beazley, D.M.; Lomdahl, P.S.

    1994-12-08

    We present an overview of the tactics we have used to achieve a high-level of performance while improving portability for a large-scale molecular dynamics code SPaSM. SPaSM was originally implemented in ANSI C with message passing for the Connection Machine 5 (CM-5). In 1993, SPaSM was selected as one of the winners in the IEEE Gordon Bell Prize competition for sustaining 50 Gflops on the 1024 node CM-5 at Los Alamos National Laboratory. Achieving this performance on the CM-5 required rewriting critical sections of code in CDPEAC assembler language. In addition, the code made extensive use of CM-5 parallel I/Omore » and the CMMD message passing library. Given this highly specialized implementation, we describe how we have ported the code to the Cray T3D and high performance workstations. In addition we will describe how it has been possible to do this using a single version of source code that runs on all three platforms without sacrificing any performance. Sound too good to be true? We hope to demonstrate that one can realize both code performance and portability without relying on the latest and greatest prepackaged tool or parallelizing compiler.« less

  14. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    NASA Astrophysics Data System (ADS)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering, industrial and physical applications. However, despite recent modelling advances, the accurate numerical solution of the equations governing such problems is still at a relatively early stage. Indeed, recent studies employing a simplifying long-wave approximation have shown that highly efficient numerical methods are necessary to solve the resulting lubrication equations in order to achieve the level of grid resolution required to accurately capture the effects of micro- and nano-scale topographical features. Solution method: A portable parallel multigrid algorithm has been developed for the above purpose, for the particular case of flow over submerged topographical features. Within the multigrid framework adopted, a W-cycle is used to accelerate convergence in respect of the time dependent nature of the problem, with relaxation sweeps performed using a fixed number of pre- and post-Red-Black Gauss-Seidel Newton iterations. In addition, the algorithm incorporates automatic adaptive time-stepping to avoid the computational expense associated with repeated time-step failure. Running time: 1.31 minutes using 128 processors on BlueGene/P with a problem size of over 16.7 million mesh points.

  15. Early Experiences Writing Performance Portable OpenMP 4 Codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joubert, Wayne; Hernandez, Oscar R

    In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less

  16. Scalable and portable visualization of large atomistic datasets

    NASA Astrophysics Data System (ADS)

    Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

    2004-10-01

    A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data minimization, and levels-of-detail for minimal rendering Restrictions on the complexity of the problem: None Typical running time: The program is interactive in its execution Unusual features of the program: None References: The conceptual foundation and subsequent implementation of the algorithms are found in [A. Sharma, A. Nakano, R.K. Kalia, P. Vashishta, S. Kodiyalam, P. Miller, W. Zhao, X.L. Liu, T.J. Campbell, A. Haas, Presence—Teleoperators and Virtual Environments 12 (1) (2003)].

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sayan Ghosh, Jeff Hammond

    OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less

  18. Understanding Portability of a High-Level Programming Model on Contemporary Heterogeneous Architectures

    DOE PAGES

    Sabne, Amit J.; Sakdhnagool, Putt; Lee, Seyong; ...

    2015-07-13

    Accelerator-based heterogeneous computing is gaining momentum in the high-performance computing arena. However, the increased complexity of heterogeneous architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle this problem. Although the abstraction provided by OpenACC offers productivity, it raises questions concerning both functional and performance portability. In this article, the authors propose HeteroIR, a high-level, architecture-independent intermediate representation, to map high-level programming models, such as OpenACC, to heterogeneous architectures. They present a compiler approach that translates OpenACC programs into HeteroIR and accelerator kernels to obtain OpenACC functional portability. They then evaluate the performance portability obtained bymore » OpenACC with their approach on 12 OpenACC programs on Nvidia CUDA, AMD GCN, and Intel Xeon Phi architectures. They study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.« less

  19. : A Scalable and Transparent System for Simulating MPI Programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perumalla, Kalyan S

    2010-01-01

    is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less

  20. Daylight control system device and method

    DOEpatents

    Paton, John Douglas

    2007-03-13

    A system and device for and a method of programming and controlling light fixtures is disclosed. A system in accordance with the present invention includes a stationary controller unit that is electrically coupled to the light fixtures. The stationary controller unit is configured to be remotely programmed with a portable commissioning device to automatically control the lights fixtures. The stationary controller unit and the portable commissioning device include light sensors, micro-computers and transceivers for measuring light levels, running programs, storing data and transmitting data between the stationary controller unit and the portable commissioning device. In operation, target light levels selected with the portable commissioning device and the controller unit is remotely programmed to automatically maintain the target level.

  1. Daylight control system, device and method

    DOEpatents

    Paton, John Douglas

    2012-08-28

    A system and device for and a method of programming and controlling light fixtures is disclosed. A system in accordance with the present invention includes a stationary controller unit that is electrically coupled to the light fixtures. The stationary controller unit is configured to be remotely programmed with a portable commissioning device to automatically control the lights fixtures. The stationary controller unit and the portable commissioning device include light sensors, micro-computers and transceivers for measuring light levels, running programs, storing data and transmitting data between the stationary controller unit and the portable commissioning device. In operation, target light levels selected with the portable commissioning device and the controller unit is remotely programmed to automatically maintain the target level.

  2. Daylight control system device and method

    DOEpatents

    Paton, John Douglas

    2009-12-01

    A system and device for and a method of programming and controlling light fixtures is disclosed. A system in accordance with the present invention includes a stationary controller unit that is electrically coupled to the light fixtures. The stationary controller unit is configured to be remotely programmed with a portable commissioning device to automatically control the lights fixtures. The stationary controller unit and the portable commissioning device include light sensors, micro-computers and transceivers for measuring light levels, running programs, storing data and transmitting data between the stationary controller unit and the portable commissioning device. In operation, target light levels selected with the portable commissioning device and the controller unit is remotely programmed to automatically maintain the target level.

  3. Optimisation of a parallel ocean general circulation model

    NASA Astrophysics Data System (ADS)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  4. Portable Medical Laboratory Applications Software

    PubMed Central

    Silbert, Jerome A.

    1983-01-01

    Portability implies that a program can be run on a variety of computers with minimal software revision. The advantages of portability are outlined and design considerations for portable laboratory software are discussed. Specific approaches for achieving this goal are presented.

  5. ATDM LANL FleCSI: Topology and Execution Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bergen, Benjamin Karl

    FleCSI is a compile-time configurable C++ framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. This means that FleCSI is potentially useful to many different ECP projects. Current support includes multidimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures (to identify data dependencies between distributed-memory address spaces). FleCSI introduces a functional programming model with control, execution, and data abstractionsmore » that are consistent with state-of-the-art task-based runtimes such as Legion and Charm++. The model also provides support for fine-grained, data-parallel execution with backend support for runtimes such as OpenMP and C++17. The FleCSI abstraction layer provides the developer with insulation from the underlying runtimes, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. This project is essential to the ECP Ristra Next-Generation Code project, part of ASC ATDM, because it provides a hierarchically parallel programming model that is consistent with the design of modern system architectures, but which allows for the straightforward expression of algorithmic parallelism in a portably performant manner.« less

  6. Quantitative determinations using portable Raman spectroscopy.

    PubMed

    Navin, Chelliah V; Tondepu, Chaitanya; Toth, Roxana; Lawson, Latevi S; Rodriguez, Jason D

    2017-03-20

    A portable Raman spectrometer was used to develop chemometric models to determine percent (%) drug release and potency for 500mg ciprofloxacin HCl tablets. Parallel dissolution and chromatographic experiments were conducted alongside Raman experiments to assess and compare the performance and capabilities of portable Raman instruments in determining critical drug attributes. All batches tested passed the 30min dissolution specification and the Raman model for drug release was able to essentially reproduce the dissolution profiles obtained by ultraviolet spectroscopy at 276nm for all five batches of the 500mg ciprofloxacin tablets. The five batches of 500mg ciprofloxacin tablets also passed the potency (assay) specification and the % label claim for the entire set of tablets run were nearly identical, 99.4±5.1 for the portable Raman method and 99.2±1.2 for the chromatographic method. The results indicate that portable Raman spectrometers can be used to perform quantitative analysis of critical product attributes of finished drug products. The findings of this study indicate that portable Raman may have applications in the areas of process analytical technology and rapid pharmaceutical surveillance. Published by Elsevier B.V.

  7. LLVM Infrastructure and Tools Project Summary

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCormick, Patrick Sean

    2017-11-06

    This project works with the open source LLVM Compiler Infrastructure (http://llvm.org) to provide tools and capabilities that address needs and challenges faced by ECP community (applications, libraries, and other components of the software stack). Our focus is on providing a more productive development environment that enables (i) improved compilation times and code generation for parallelism, (ii) additional features/capabilities within the design and implementations of LLVM components for improved platform/performance portability and (iii) improved aspects related to composition of the underlying implementation details of the programming environment, capturing resource utilization, overheads, etc. -- including runtime systems that are often not easilymore » addressed by application and library developers.« less

  8. Scalable Performance Environments for Parallel Systems

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Olson, Robert D.; Aydt, Ruth A.; Madhyastha, Tara M.; Birkett, Thomas; Jensen, David W.; Nazief, Bobby A. A.; Totty, Brian K.

    1991-01-01

    As parallel systems expand in size and complexity, the absence of performance tools for these parallel systems exacerbates the already difficult problems of application program and system software performance tuning. Moreover, given the pace of technological change, we can no longer afford to develop ad hoc, one-of-a-kind performance instrumentation software; we need scalable, portable performance analysis tools. We describe an environment prototype based on the lessons learned from two previous generations of performance data analysis software. Our environment prototype contains a set of performance data transformation modules that can be interconnected in user-specified ways. It is the responsibility of the environment infrastructure to hide details of module interconnection and data sharing. The environment is written in C++ with the graphical displays based on X windows and the Motif toolkit. It allows users to interconnect and configure modules graphically to form an acyclic, directed data analysis graph. Performance trace data are represented in a self-documenting stream format that includes internal definitions of data types, sizes, and names. The environment prototype supports the use of head-mounted displays and sonic data presentation in addition to the traditional use of visual techniques.

  9. Operations manual for portable profiler : installing and using the portable profiler.

    DOT National Transportation Integrated Search

    2010-06-01

    This manual is divided into two sections. The first is using the UTA-Profiler Program : with the portable profiler for generating surface profilers. The second is installing the : portable profiler module on a typical van or truck. The calibration an...

  10. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.

    PubMed

    Gierahn, Todd M; Wadsworth, Marc H; Hughes, Travis K; Bryson, Bryan D; Butler, Andrew; Satija, Rahul; Fortune, Sarah; Love, J Christopher; Shalek, Alex K

    2017-04-01

    Single-cell RNA-seq can precisely resolve cellular states, but applying this method to low-input samples is challenging. Here, we present Seq-Well, a portable, low-cost platform for massively parallel single-cell RNA-seq. Barcoded mRNA capture beads and single cells are sealed in an array of subnanoliter wells using a semipermeable membrane, enabling efficient cell lysis and transcript capture. We use Seq-Well to profile thousands of primary human macrophages exposed to Mycobacterium tuberculosis.

  11. Toward performance portability of the Albany finite element analysis code using the Kokkos library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.

    Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less

  12. Toward performance portability of the Albany finite element analysis code using the Kokkos library

    DOE PAGES

    Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...

    2018-02-05

    Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less

  13. PRAIS: Distributed, real-time knowledge-based systems made easy

    NASA Technical Reports Server (NTRS)

    Goldstein, David G.

    1990-01-01

    This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.

  14. A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes

    NASA Technical Reports Server (NTRS)

    Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.

    1999-01-01

    The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.

  15. Proceedings of the second SISAL users` conference

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feo, J T; Frerking, C; Miller, P J

    1992-12-01

    This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less

  16. User's guide to noise data acquisition and analysis programs for HP9845: Nicolet analyzers

    NASA Technical Reports Server (NTRS)

    Mcgary, M. C.

    1982-01-01

    A software interface package was written for use with a desktop computer and two models of single channel Fast Fourier analyzers. This software features a portable measurement and analysis system with several options. Two types of interface hardware can alternately be used in conjunction with the software. Either an IEEE-488 Bus interface or a 16-bit parallel system may be used. Two types of storage medium, either tape cartridge or floppy disc can be used with the software. Five types of data may be stored, plotted, and/or printed. The data types include time histories, narrow band power spectra, and narrow band, one-third octave band, or octave band sound pressure level. The data acquisition programming includes a front panel remote control option for the FFT analyzers. Data analysis options include choice of line type and pen color for plotting.

  17. A Parallel Processing Algorithm for Remote Sensing Classification

    NASA Technical Reports Server (NTRS)

    Gualtieri, J. Anthony

    2005-01-01

    A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.

  18. Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

    NASA Astrophysics Data System (ADS)

    Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

    2017-03-01

    We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.

  19. The ASC Sequoia Programming Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seager, M

    2008-08-06

    In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less

  20. Parallelization and checkpointing of GPU applications through program transformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solano-Quinde, Lizandro Damian

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less

  1. An in-line spectrophotometer on a centrifugal microfluidic platform for real-time protein determination and calibration.

    PubMed

    Ding, Zhaoxiong; Zhang, Dongying; Wang, Guanghui; Tang, Minghui; Dong, Yumin; Zhang, Yixin; Ho, Ho-Pui; Zhang, Xuping

    2016-09-21

    In this paper, an in-line, low-cost, miniature and portable spectrophotometric detection system is presented and used for fast protein determination and calibration in centrifugal microfluidics. Our portable detection system is configured with paired emitter and detector diodes (PEDD), where the light beam between both LEDs is collimated with enhanced system tolerance. It is the first time that a physical model of PEDD is clearly presented, which could be modelled as a photosensitive RC oscillator. A portable centrifugal microfluidic system that contains a wireless port in real-time communication with a smartphone has been built to show that PEDD is an effective strategy for conducting rapid protein bioassays with detection performance comparable to that of a UV-vis spectrophotometer. The choice of centrifugal microfluidics offers the unique benefits of highly parallel fluidic actuation at high accuracy while there is no need for a pump, as inertial forces are present within the entire spinning disc and accurately controlled by varying the spinning speed. As a demonstration experiment, we have conducted the Bradford assay for bovine serum albumin (BSA) concentration calibration from 0 to 2 mg mL(-1). Moreover, a novel centrifugal disc with a spiral microchannel is proposed for automatic distribution and metering of the sample to all the parallel reactions at one time. The reported lab-on-a-disc scheme with PEDD detection may offer a solution for high-throughput assays, such as protein density calibration, drug screening and drug solubility measurement that require the handling of a large number of reactions in parallel.

  2. A portable 12-wavelength parallel near-infrared spectral tomography (NIRST) system for efficient characterization of breast cancer during neoadjuvant chemotherapy

    NASA Astrophysics Data System (ADS)

    Zhao, Yan; Burger, William R.; Zhou, Mingwei; Pogue, Brian W.; Paulsen, Keith D.; Jiang, Shudong

    2017-02-01

    A portable, 12-wavelength hybrid frequency domain (FD) and continuous wave (CW) near-infrared spectral tomography (NIRST) system was developed for efficient characterization of breast cancer in a clinical oncology setting. Two sets of three FD and three CW measurements were acquired simultaneously. The imaging time was reduced from 90 to 55 seconds with a new gain adjustment scheme of the optical detector. The study of integrating this system into the workflow of clinical oncology practice is ongoing.

  3. Copy Hiding Application Interface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, Holger; Poliakoff, David; Robinson, Peter

    2016-10-06

    CHAI is a light-weight framework which abstracts the automated movement of data (e.g. to/from Host/Device) via RAJA like performance portability programming model constructs. It can be viewed as a utility framework and an adjunct to FAJA (A Performance Portability Framework). Performance Portability is a technique that abstracts the complexities of modern Heterogeneous Architectures while allowing the original program to undergo incremental minimally invasive code changes in order to adapt to the newer architectures.

  4. Scalability and Portability of Two Parallel Implementations of ADI

    NASA Technical Reports Server (NTRS)

    Phung, Thanh; VanderWijngaart, Rob F.

    1994-01-01

    Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.

  5. Design and implementation of a smartphone-based portable ultrasound pulsed-wave Doppler device for blood flow measurement.

    PubMed

    Huang, Chih-Chung; Lee, Po-Yang; Chen, Pay-Yu; Liu, Ting-Yu

    2012-01-01

    Blood flow measurement using Doppler ultrasound has become a useful tool for diagnosing cardiovascular diseases and as a physiological monitor. Recently, pocket-sized ultrasound scanners have been introduced for portable diagnosis. The present paper reports the implementation of a portable ultrasound pulsed-wave (PW) Doppler flowmeter using a smartphone. A 10-MHz ultrasonic surface transducer was designed for the dynamic monitoring of blood flow velocity. The directional baseband Doppler shift signals were obtained using a portable analog circuit system. After hardware processing, the Doppler signals were fed directly to a smartphone for Doppler spectrogram analysis and display in real time. To the best of our knowledge, this is the first report of the use of this system for medical ultrasound Doppler signal processing. A Couette flow phantom, consisting of two parallel disks with a 2-mm gap, was used to evaluate and calibrate the device. Doppler spectrograms of porcine blood flow were measured using this stand-alone portable device under the pulsatile condition. Subsequently, in vivo portable system verification was performed by measuring the arterial blood flow of a rat and comparing the results with the measurement from a commercial ultrasound duplex scanner. All of the results demonstrated the potential for using a smartphone as a novel embedded system for portable medical ultrasound applications. © 2012 IEEE

  6. Design and Fabrication of Multifunctional Portable Bi2Te3-Based Thermoelectric Camping Lamp

    NASA Astrophysics Data System (ADS)

    Zhou, Yi; Li, Gongping

    2018-05-01

    Camping lamps have been widely used in the lighting, power supply, and intelligent electronic equipment fields. However, applications of traditional chemical and solar camping lamps are largely limited by the physical size of the source and operating conditions. A new prototype multifunctional portable Bi2Te3-based thermoelectric camping lamp (TECL) has been designed and fabricated. Ten parallel light-emitting diodes were lit directly by a Bi2Te3-based thermoelectric generator (TEG). The highest short-circuit current of 0.38 A and open-circuit voltage of 4.2 V were obtained at temperature difference of 115 K. This TECL is attractive for use in multifunctional and extreme applications as it integrates a portable heat source, high-performance TEG, and power management unit.

  7. Design and Fabrication of Multifunctional Portable Bi2Te3-Based Thermoelectric Camping Lamp

    NASA Astrophysics Data System (ADS)

    Zhou, Yi; Li, Gongping

    2018-07-01

    Camping lamps have been widely used in the lighting, power supply, and intelligent electronic equipment fields. However, applications of traditional chemical and solar camping lamps are largely limited by the physical size of the source and operating conditions. A new prototype multifunctional portable Bi2Te3-based thermoelectric camping lamp (TECL) has been designed and fabricated. Ten parallel light-emitting diodes were lit directly by a Bi2Te3-based thermoelectric generator (TEG). The highest short-circuit current of 0.38 A and open-circuit voltage of 4.2 V were obtained at temperature difference of 115 K. This TECL is attractive for use in multifunctional and extreme applications as it integrates a portable heat source, high-performance TEG, and power management unit.

  8. UPEML: a machine-portable CDC Update emulator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mehlhorn, T.A.; Young, M.F.

    1984-12-01

    UPEML is a machine-portable CDC Update emulation program. UPEML is written in ANSI standard Fortran-77 and is relatively simple and compact. It is capable of emulating a significant subset of the standard CDC Update functions including program library creation and subsequent modification. Machine-portability is an essential attribute of UPEML. It was written primarily to facilitate the use of CDC-based scientific packages on alternate computer systems such as the VAX 11/780 and the IBM 3081.

  9. Milestone Completion Report STCO04-1 AAPS: engagements with code teams, vendors, collaborators, developers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Draeger, E. W.

    The Advanced Architecture and Portability Specialists team (AAPS) worked with a select set of LLNL application teams to develop and/or implement a portability strategy for next-generation architectures. The team also investigated new and updated programming models and helped develop programming abstractions targeting maintainability and performance portability. Significant progress was made on both fronts in FY17, resulting in multiple applications being significantly more prepared for the nextgeneration machines than before.

  10. Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery

    We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less

  11. Polymerase chain reaction system

    DOEpatents

    Benett, William J.; Richards, James B.; Stratton, Paul L.; Hadley, Dean R.; Milanovich, Fred P.; Belgrader, Phil; Meyer, Peter L.

    2004-03-02

    A portable polymerase chain reaction DNA amplification and detection system includes one or more chamber modules. Each module supports a duplex assay of a biological sample. Each module has two parallel interrogation ports with a linear optical system. The system is capable of being handheld.

  12. Seismic imaging using finite-differences and parallel computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ober, C.C.

    1997-12-31

    A key to reducing the risks and costs of associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Prestack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar wave equation using finite differences. As part of an ongoing ACTI project funded by the US Department of Energy, a finite difference, 3-D prestack, depth migration code has been developed. The goal of this work is to demonstrate that massively parallel computersmore » can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite difference, prestack, depth migration practical for oil and gas exploration. Several problems had to be addressed to get an efficient code for the Intel Paragon. These include efficient I/O, efficient parallel tridiagonal solves, and high single-node performance. Furthermore, to provide portable code the author has been restricted to the use of high-level programming languages (C and Fortran) and interprocessor communications using MPI. He has been using the SUNMOS operating system, which has affected many of his programming decisions. He will present images created from two verification datasets (the Marmousi Model and the SEG/EAEG 3D Salt Model). Also, he will show recent images from real datasets, and point out locations of improved imaging. Finally, he will discuss areas of current research which will hopefully improve the image quality and reduce computational costs.« less

  13. Use Computer-Aided Tools to Parallelize Large CFD Applications

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Yan, J.

    2000-01-01

    Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.

  14. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  15. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  16. Massively parallel and linear-scaling algorithm for second-order Moller–Plesset perturbation theory applied to the study of supramolecular wires

    DOE PAGES

    Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...

    2016-11-16

    Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less

  17. 40 CFR 204.52 - Portable air compressor noise emission standard.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 25 2011-07-01 2011-07-01 false Portable air compressor noise emission standard. 204.52 Section 204.52 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) NOISE ABATEMENT PROGRAMS NOISE EMISSION STANDARDS FOR CONSTRUCTION EQUIPMENT Portable Air Compressors § 204.52...

  18. 40 CFR 204.52 - Portable air compressor noise emission standard.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 24 2010-07-01 2010-07-01 false Portable air compressor noise emission standard. 204.52 Section 204.52 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) NOISE ABATEMENT PROGRAMS NOISE EMISSION STANDARDS FOR CONSTRUCTION EQUIPMENT Portable Air Compressors § 204.52...

  19. 24 CFR 984.306 - Section 8 residency and portability requirements.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... URBAN DEVELOPMENT SECTION 8 AND PUBLIC HOUSING FAMILY SELF-SUFFICIENCY PROGRAM Program Operation § 984... 24 Housing and Urban Development 4 2010-04-01 2010-04-01 false Section 8 residency and portability requirements. 984.306 Section 984.306 Housing and Urban Development Regulations Relating to Housing and Urban...

  20. Clarity: An Open Source Manager for Laboratory Automation

    PubMed Central

    Delaney, Nigel F.; Echenique, José Rojas; Marx, Christopher J.

    2013-01-01

    Software to manage automated laboratories interfaces with hardware instruments, gives users a way to specify experimental protocols, and schedules activities to avoid hardware conflicts. In addition to these basics, modern laboratories need software that can run multiple different protocols in parallel and that can be easily extended to interface with a constantly growing diversity of techniques and instruments. We present Clarity: a laboratory automation manager that is hardware agnostic, portable, extensible and open source. Clarity provides critical features including remote monitoring, robust error reporting by phone or email, and full state recovery in the event of a system crash. We discuss the basic organization of Clarity; demonstrate an example of its implementation for the automated analysis of bacterial growth; and describe how the program can be extended to manage new hardware. Clarity is mature; well documented; actively developed; written in C# for the Common Language Infrastructure; and is free and open source software. These advantages set Clarity apart from currently available laboratory automation programs. PMID:23032169

  1. Portable power source needs of the future Army -- Batteries and fuel cells

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacobs, R.; Christopher, H.; Hamlen, R.

    This paper describes the US Army`s future needs for silent portable power in the area of batteries and fuel cells. These needs will continue to increase as a result of the introduction of newer types of equipment, the increasing digitization of the battlefield, and future integrated Soldier Systems. Current battery programs are aimed at improved, low-cost primary batteries, and rechargeable batteries with increased energy densities. The Army fuel cell program aimed at portable systems capable of the order of 150W is also described.

  2. Ada Compiler Validation Summary Report: Certificate Number: 920509S1. 11259 Alenia Aeritalia and Selenia S.p.A DACS VAX/VMS to 80x86 PM MARA Ada Cross Compiler, Version 4.6 Microvax 4000/200 = MARA

    DTIC Science & Technology

    1992-01-01

    area end basic io types; 11.3.3 TEXT 10 -- Date 31 October 1983 -- Programmer Soeren Prehn (, Knud Joergen Kirkegaard) -- Project Portable Ada...Programmer Peter Haff (, Soeren Prehn , Knud Joergen Kirkegaard) -- Project Portable Ada Programming System -- Module SEQIOS.ADA -- Description...Peter Haff (,Soeren Prehn , Knud Joergen Kirkegaard) -- Project Portable Ada Programming System -- Module DIR IO.ADA -- Description Specification of

  3. CFD Analysis and Design Optimization Using Parallel Computers

    NASA Technical Reports Server (NTRS)

    Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James

    1997-01-01

    A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.

  4. A low-cost and portable realization on fringe projection three-dimensional measurement

    NASA Astrophysics Data System (ADS)

    Xiao, Suzhi; Tao, Wei; Zhao, Hui

    2015-12-01

    Fringe projection three-dimensional measurement is widely applied in a wide range of industrial application. The traditional fringe projection system has the disadvantages of high expense, big size, and complicated calibration requirements. In this paper we introduce a low-cost and portable realization on three-dimensional measurement with Pico projector. It has the advantages of low cost, compact physical size, and flexible configuration. For the proposed fringe projection system, there is no restriction to camera and projector's relative alignment on parallelism and perpendicularity for installation. Moreover, plane-based calibration method is adopted in this paper that avoids critical requirements on calibration system such as additional gauge block or precise linear z stage. What is more, error sources existing in the proposed system are introduced in this paper. The experimental results demonstrate the feasibility of the proposed low cost and portable fringe projection system.

  5. Comparison of soil pollution concentrations determined using AAS and portable XRF techniques.

    PubMed

    Radu, Tanja; Diamond, Dermot

    2009-11-15

    Past mining activities in the area of Silvermines, Ireland, have resulted in heavily polluted soils. The possibility of spreading pollution to the surrounding areas through dust blow-offs poses a potential threat for the local communities. Conventional environmental soil and dust analysis techniques are very slow and laborious and consequently there is a need for fast and accurate analytical methods, which can provide real-time in situ pollution mapping. Laboratory-based aqua regia acid digestion of the soil samples collected in the area followed by the atomic absorption spectrophotometry (AAS) analysis confirmed very high pollution, especially by Pb, As, Cu, and Zn. In parallel, samples were analyzed using portable X-ray fluorescence radioisotope and miniature tube powered (XRF) NITON instruments and their performance was compared. Overall, the portable XRF instrument gave excellent correlation with the laboratory-based reference AAS method.

  6. Technology assessment of portable energy RDT and P, phase 1

    NASA Technical Reports Server (NTRS)

    Spraul, J. R. (Compiler)

    1975-01-01

    A technological assessment of portable energy research, development, technology, and production was undertaken to assess the technical, economic, environmental, and sociopolitical issues associated with portable energy options. Those courses of action are discussed which would impact aviation and air transportation research and technology. Technology assessment workshops were held to develop problem statements. The eighteen portable energy problem statements are discussed in detail along with each program's objective, approach, task description, and estimates of time and costs.

  7. rfpipe: Radio interferometric transient search pipeline

    NASA Astrophysics Data System (ADS)

    Law, Casey J.

    2017-10-01

    rfpipe supports Python-based analysis of radio interferometric data (especially from the Very Large Array) and searches for fast radio transients. This extends on the rtpipe library (ascl:1706.002) with new approaches to parallelization, acceleration, and more portable data products. rfpipe can run in standalone mode or be in a cluster environment.

  8. OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows

    DOE PAGES

    Xia, Yidong; Lou, Jialin; Luo, Hong; ...

    2015-02-09

    Here, an OpenACC directive-based graphics processing unit (GPU) parallel scheme is presented for solving the compressible Navier–Stokes equations on 3D hybrid unstructured grids with a third-order reconstructed discontinuous Galerkin method. The developed scheme requires the minimum code intrusion and algorithm alteration for upgrading a legacy solver with the GPU computing capability at very little extra effort in programming, which leads to a unified and portable code development strategy. A face coloring algorithm is adopted to eliminate the memory contention because of the threading of internal and boundary face integrals. A number of flow problems are presented to verify the implementationmore » of the developed scheme. Timing measurements were obtained by running the resulting GPU code on one Nvidia Tesla K20c GPU card (Nvidia Corporation, Santa Clara, CA, USA) and compared with those obtained by running the equivalent Message Passing Interface (MPI) parallel CPU code on a compute node (consisting of two AMD Opteron 6128 eight-core CPUs (Advanced Micro Devices, Inc., Sunnyvale, CA, USA)). Speedup factors of up to 24× and 1.6× for the GPU code were achieved with respect to one and 16 CPU cores, respectively. The numerical results indicate that this OpenACC-based parallel scheme is an effective and extensible approach to port unstructured high-order CFD solvers to GPU computing.« less

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, Richard D.; Hones, Holger E.

    The RAJA Performance Suite is designed to evaluate performance of the RAJA performance portability library on a wide variety of important high performance computing (HPC) algorithmic lulmels. These kernels assess compiler optimizations and various parallel programming model backends accessible through RAJA, such as OpenMP, CUDA, etc. The Initial version of the suite contains 25 computational kernels, each of which appears in 6 variants: Baseline SequcntiaJ, RAJA SequentiaJ, Baseline OpenMP, RAJA OpenMP, Baseline CUDA, RAJA CUDA. All variants of each kernel perform essentially the same mathematical operations and the loop body code for each kernel is identical across all variants. Theremore » are a few kernels, such as those that contain reduction operations, that require CUDA-specific coding for their CUDA variants. ActuaJ computer instructions executed and how they run in parallel differs depending on the parallel programming model backend used and which optimizations are perfonned by the compiler used to build the Perfonnance Suite executable. The Suite will be used primarily by RAJA developers to perform regular assessments of RAJA performance across a range of hardware platforms and compilers as RAJA features are being developed. It will also be used by LLNL hardware and software vendor panners for new defining requirements for future computing platform procurements and acceptance testing. In particular, the RAJA Performance Suite will be used for compiler acceptance testing of the upcoming CORAUSierra machine {initial LLNL delivery expected in late-2017/early 2018) and the CORAL-2 procurement. The Suite will aJso be used to generate concise source code reproducers of compiler and runtime issues we uncover so that we may provide them to relevant vendors to be fixed.« less

  10. Portable instrument for inspecting irradiated nuclear fuel assemblies

    DOEpatents

    Nicholson, Nicholas; Dowdy, Edward J.; Holt, David M.; Stump, Jr., Charles J.

    1985-01-01

    A portable instrument for measuring induced Cerenkov radiation associated with irradiated nuclear fuel assemblies in a water-filled storage pond is disclosed. The instrument includes a photomultiplier tube and an image intensifier which are operable in parallel and simultaneously by means of a field lens assembly and an associated beam splitter. The image intensifier permits an operator to aim and focus the apparatus on a submerged fuel assembly. Once the instrument is aimed and focused, an illumination reading can be obtained with the photomultiplier tube. The instrument includes a lens cap with a carbon-14/phosphor light source for calibrating the apparatus in the field.

  11. Embedded Data Processor and Portable Computer Technology testbeds

    NASA Technical Reports Server (NTRS)

    Alena, Richard; Liu, Yuan-Kwei; Goforth, Andre; Fernquist, Alan R.

    1993-01-01

    Attention is given to current activities in the Embedded Data Processor and Portable Computer Technology testbed configurations that are part of the Advanced Data Systems Architectures Testbed at the Information Sciences Division at NASA Ames Research Center. The Embedded Data Processor Testbed evaluates advanced microprocessors for potential use in mission and payload applications within the Space Station Freedom Program. The Portable Computer Technology (PCT) Testbed integrates and demonstrates advanced portable computing devices and data system architectures. The PCT Testbed uses both commercial and custom-developed devices to demonstrate the feasibility of functional expansion and networking for portable computers in flight missions.

  12. Compiled MPI: Cost-Effective Exascale Applications Development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Quinlan, D; Lumsdaine, A

    2012-04-10

    The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less

  13. Linear-Algebra Programs

    NASA Technical Reports Server (NTRS)

    Lawson, C. L.; Krogh, F. T.; Gold, S. S.; Kincaid, D. R.; Sullivan, J.; Williams, E.; Hanson, R. J.; Haskell, K.; Dongarra, J.; Moler, C. B.

    1982-01-01

    The Basic Linear Algebra Subprograms (BLAS) library is a collection of 38 FORTRAN-callable routines for performing basic operations of numerical linear algebra. BLAS library is portable and efficient source of basic operations for designers of programs involving linear algebriac computations. BLAS library is supplied in portable FORTRAN and Assembler code versions for IBM 370, UNIVAC 1100 and CDC 6000 series computers.

  14. Advances in developing rapid, reliable and portable detection systems for alcohol.

    PubMed

    Thungon, Phurpa Dema; Kakoti, Ankana; Ngashangva, Lightson; Goswami, Pranab

    2017-11-15

    Development of portable, reliable, sensitive, simple, and inexpensive detection system for alcohol has been an instinctive demand not only in traditional brewing, pharmaceutical, food and clinical industries but also in rapidly growing alcohol based fuel industries. Highly sensitive, selective, and reliable alcohol detections are currently amenable typically through the sophisticated instrument based analyses confined mostly to the state-of-art analytical laboratory facilities. With the growing demand of rapid and reliable alcohol detection systems, an all-round attempt has been made over the past decade encompassing various disciplines from basic and engineering sciences. Of late, the research for developing small-scale portable alcohol detection system has been accelerated with the advent of emerging miniaturization techniques, advanced materials and sensing platforms such as lab-on-chip, lab-on-CD, lab-on-paper etc. With these new inter-disciplinary approaches along with the support from the parallel knowledge growth on rapid detection systems being pursued for various targets, the progress on translating the proof-of-concepts to commercially viable and environment friendly portable alcohol detection systems is gaining pace. Here, we summarize the progress made over the years on the alcohol detection systems, with a focus on recent advancement towards developing portable, simple and efficient alcohol sensors. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Computing NLTE Opacities -- Node Level Parallel Calculation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holladay, Daniel

    Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.

  16. Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Ye; Ma, Xiaosong; Liu, Qing Gary

    2015-01-01

    Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less

  17. A parallel and modular deformable cell Car-Parrinello code

    NASA Astrophysics Data System (ADS)

    Cavazzoni, Carlo; Chiarotti, Guido L.

    1999-12-01

    We have developed a modular parallel code implementing the Car-Parrinello [Phys. Rev. Lett. 55 (1985) 2471] algorithm including the variable cell dynamics [Europhys. Lett. 36 (1994) 345; J. Phys. Chem. Solids 56 (1995) 510]. Our code is written in Fortran 90, and makes use of some new programming concepts like encapsulation, data abstraction and data hiding. The code has a multi-layer hierarchical structure with tree like dependences among modules. The modules include not only the variables but also the methods acting on them, in an object oriented fashion. The modular structure allows easier code maintenance, develop and debugging procedures, and is suitable for a developer team. The layer structure permits high portability. The code displays an almost linear speed-up in a wide range of number of processors independently of the architecture. Super-linear speed up is obtained with a "smart" Fast Fourier Transform (FFT) that uses the available memory on the single node (increasing for a fixed problem with the number of processing elements) as temporary buffer to store wave function transforms. This code has been used to simulate water and ammonia at giant planet conditions for systems as large as 64 molecules for ˜50 ps.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Seyong; Kim, Jungwon; Vetter, Jeffrey S

    This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler; it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high- level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, thismore » design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-level programming strategy for performance portability across heterogeneous HPC architectures.« less

  19. A Portable Cell Maintenance System for Rapid Toxicity Monitoring Final Report CRADA No. TC-02081-04

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kane, S.; Zhou, P.

    The Phase I STTR research project was targeted at meeting the objectives and requirements stated in STTR solicitation A04-T028 for a Portable Cell Maintenance System for Rapid Toxicity Monitoring. In accordance with the requirements for STTR programs, collaboration was formed between a small business, Kionix, Inc., and The Regents of the University of California, Lawrence Livermore National Laboratory (LLNL). The collaboration included CytoDiscovery, Inc. (CDI) which, in collaboration with Kionix, provided access to membrane chip technology and provided program support and coordination. The objective of the overall program (excerpted from the original solicitation) was: “To develop a small, portable cellmore » maintenance system for the transport, storage, and monitoring of viable vertebrate cells and tissues.” The goal of the Phase I project was to demonstrate the feasibility of achieving the program objectives utilizing a system comprised of a small-size, microfluidic chip-based cell maintenance cartridge (CMC) and a portable cell maintenance system (CMS) capable of housing a minimum of four CMCs. The system was designed to be capable of optimally maintaining multiple vertebrate cell types while supporting a wide variety of cellular assays.« less

  20. ASC Tri-lab Co-design Level 2 Milestone Report 2015

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, Rich; Jones, Holger; Keasler, Jeff

    2015-09-23

    In 2015, the three Department of Energy (DOE) National Laboratories that make up the Advanced Sci- enti c Computing (ASC) Program (Sandia, Lawrence Livermore, and Los Alamos) collaboratively explored performance portability programming environments in the context of several ASC co-design proxy applica- tions as part of a tri-lab L2 milestone executed by the co-design teams at each laboratory. The programming environments that were studied included Kokkos (developed at Sandia), RAJA (LLNL), and Legion (Stan- ford University). The proxy apps studied included: miniAero, LULESH, CoMD, Kripke, and SNAP. These programming models and proxy-apps are described herein. Each lab focused on amore » particular combination of abstractions and proxy apps, with the goal of assessing performance portability using those. Performance portability was determined by: a) the ability to run a single application source code on multiple advanced architectures, b) comparing runtime performance between \

  1. NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

    2014-07-01

    We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.

  2. An Analysis of the INGRES Database Management System Applications Program Development Tools and Programming Environment

    DTIC Science & Technology

    1986-12-01

    Position cursor over the naBe of a report, then use the appropriate enu iteffl to perforn an operation on that report. Naae Owner RBF? Last changed...LANGUAGE- INDEPENDENT, PORTABLE FILE ACCESS SY STEM A MODEL FOR AUTOMATIC FILE AND PROGRAM DESIGN IN BUSINE SS APPLICATION SYSTEM GENERALLY APPLICABLE...Article Description Year: 1988 Title: FLASH : A LANGUAGE- INDEPENDENT, PORTABLE FILE ACCESS SY STEM Authors: ALLCHIN.J.E., KaLER.A.H., WIEDERHOL.D.G

  3. On extending parallelism to serial simulators

    NASA Technical Reports Server (NTRS)

    Nicol, David; Heidelberger, Philip

    1994-01-01

    This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.

  4. Portable exhausters POR-004 SKID B, POR-005 SKID C, POR-006 SKID D storage plan

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nelson, O.D.

    1997-09-04

    This document provides a storage plan for portable exhausters POR-004 SKID B, POR-005 SKID C, AND POR-006 SKID D. The exhausters will be stored until they are needed by the TWRS (Tank Waste Remediation Systems) Saltwell Pumping Program. The storage plan provides criteria for portable exhauster storage, periodic inspections during storage, and retrieval from storage.

  5. Portable instrument for inspecting irradiated nuclear-fuel assemblies in a water-filled storage pond by measurement of induced Cerenkov radiation

    DOEpatents

    Nicholson, N.; Dowdy, E.J.; Holt, D.M.; Stump, C.J. Jr.

    1982-05-13

    A portable instrument for measuring induced Cerenkov radiation associated with irradiated nuclear fuel assemblies in a water-filled storage pond is disclosed. The instrument includes a photomultiplier tube and an image intensifier which are operable in parallel and simultaneously by means of a field lens assembly and an associated beam splitter. The image intensifier permits an operator to aim and focus the apparatus on a submerged fuel assembly. Once the instrument is aimed and focused, an illumination reading can be obtained with the photomultiplier tube. The instrument includes a lens cap with a carbon-14/phosphor light source for calibrating the apparatus in the field.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.

  7. Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

    NASA Technical Reports Server (NTRS)

    Goldstein, David

    1991-01-01

    Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.

  8. Development of an ultra-portable ride quality meter.

    DOT National Transportation Integrated Search

    2012-12-01

    FRAs Office of Research and Development has funded the development of an ultra-portable ride quality meter (UPRQM) under the Small Business and Innovative Research (SBIR) program. Track inspectors can use the UPRQM to locate segments of track that...

  9. The Growth of m-Learning and the Growth of Mobile Computing: Parallel Developments

    ERIC Educational Resources Information Center

    Caudill, Jason G.

    2007-01-01

    m-Learning is made possible by the existence and application of mobile hardware and networking technology. By exploring the capabilities of these technologies, it is possible to construct a picture of how different components of m-Learning can be implemented. This paper will explore the major technologies currently in use: portable digital…

  10. On-chip electrical detection of parallel loop-mediated isothermal amplification with DG-BioFETs for the detection of foodborne bacterial pathogens

    USDA-ARS?s Scientific Manuscript database

    The use of field effect transistors (FETs) as the transduction element for the detection of DNA amplification reactions will enable portable and inexpensive nucleic acid analysis. Transistors used as biological sensors,or BioFETs, minimize the cost and size of detection platforms by leveraging fabri...

  11. Field Analysis of Polychlorinated Biphenyls (PCBs) in Soil Using Solid-Phase Microextraction (SPME) and a Portable Gas Chromatography-Mass Spectrometry System.

    PubMed

    Zhang, Mengliang; Kruse, Natalie A; Bowman, Jennifer R; Jackson, Glen P

    2016-05-01

    An expedited field analysis method was developed for the determination of polychlorinated biphenyls (PCBs) in soil matrices using a portable gas chromatography-mass spectrometry (GC-MS) instrument. Soil samples of approximately 0.5 g were measured with a portable scale and PCBs were extracted by headspace solid-phase microextraction (SPME) with a 100 µm polydimethylsiloxane (PDMS) fiber. Two milliliters of 0.2 M potassium permanganate and 0.5 mL of 6 M sulfuric acid solution were added to the soil matrices to facilitate the extraction of PCBs. The extraction was performed for 30 min at 100 ℃ in a portable heating block that was powered by a portable generator. The portable GC-MS instrument took less than 6 min per analysis and ran off an internal battery and helium cylinder. Six commercial PCB mixtures, Aroclor 1016, 1221, 1232, 1242, 1248, 1254, and 1260, could be classified based on the GC chromatograms and mass spectra. The detection limit of this method for Aroclor 1260 in soil matrices is approximately 10 ppm, which is sufficient for guiding remediation efforts in contaminated sites. This method was applicable to the on-site analysis of PCBs with a total analysis time of 37 min per sample. However, the total analysis time could be improved to less than 7 min per sample by conducting the rate-limiting extraction step for different samples in parallel. © The Author(s) 2016.

  12. Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.

    PubMed

    Higginson, J S; Neptune, R R; Anderson, F C

    2005-09-01

    Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.

  13. A convenient and accurate parallel Input/Output USB device for E-Prime.

    PubMed

    Canto, Rosario; Bufalari, Ilaria; D'Ausilio, Alessandro

    2011-03-01

    Psychological and neurophysiological experiments require the accurate control of timing and synchrony for Input/Output signals. For instance, a typical Event-Related Potential (ERP) study requires an extremely accurate synchronization of stimulus delivery with recordings. This is typically done via computer software such as E-Prime, and fast communications are typically assured by the Parallel Port (PP). However, the PP is an old and disappearing technology that, for example, is no longer available on portable computers. Here we propose a convenient USB device enabling parallel I/O capabilities. We tested this device against the PP on both a desktop and a laptop machine in different stress tests. Our data demonstrate the accuracy of our system, which suggests that it may be a good substitute for the PP with E-Prime.

  14. A transient-enhanced NMOS low dropout voltage regulator with parallel feedback compensation

    NASA Astrophysics Data System (ADS)

    Han, Wang; Lin, Tan

    2016-02-01

    This paper presents a transient-enhanced NMOS low-dropout regulator (LDO) for portable applications with parallel feedback compensation. The parallel feedback structure adds a dynamic zero to get an adequate phase margin with a load current variation from 0 to 1 A. A class-AB error amplifier and a fast charging/discharging unit are adopted to enhance the transient performance. The proposed LDO has been implemented in a 0.35 μm BCD process. From experimental results, the regulator can operate with a minimum dropout voltage of 150 mV at a maximum 1 A load and IQ of 165 μA. Under the full range load current step, the voltage undershoot and overshoot of the proposed LDO are reduced to 38 mV and 27 mV respectively.

  15. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  16. Comparison between two portable devices for widefield PpIX fluorescence during cervical intraepithelial neoplasia treatment

    NASA Astrophysics Data System (ADS)

    Carbinatto, Fernanda M.; Inada, Natalia Mayumi; Lombardi, Welington; Cossetin, Natália Fernandez; Varoto, Cinthia; Kurachi, Cristina; Bagnato, Vanderlei Salvador

    2015-06-01

    The use of portable electronic devices, in particular mobile phones such as smartphones is increasing not only for all known applications, but also for diagnosis of diseases and monitoring treatments like topical Photodynamic Therapy. The aim of the study is to evaluate the production of the photosensitizer Protoporphyrin IX (PpIX) after topical application of a cream containing methyl aminolevulinate (MAL) in the cervix with diagnosis of Cervical Intraepithelial Neoplasia (CIN) through the fluorescence images captured after one and three hours and compare the images using two devices (a Sony Xperia® mobile and an Apple Ipod®. Was observed an increasing fluorescence intensity of the cervix three hours after cream application, in both portable electronic devices. However, because was used a specific program for the treatment of images using the Ipod® device, these images presented better resolution than observed by the Sony cell phone without a specific program. One hour after cream application presented a more selective fluorescence than the group of three hours. In conclusion, the use of portable devices to obtain images of PpIX fluorescence shown to be an effective tool and is necessary the improvement of programs for achievement of better results.

  17. PETSc Users Manual Revision 3.3

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balay, S.; Brown, J.; Buschelman, K.

    This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less

  18. PETSc Users Manual Revision 3.4

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balay, S.; Brown, J.; Buschelman, K.

    This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less

  19. PETSc Users Manual Revision 3.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balay, S.; Abhyankar, S.; Adams, M.

    This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself. ;For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less

  20. A fiber-coupled displacement measuring interferometer for determination of the posture of a reflective surface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mao, Shuai; Hu, Peng-Cheng, E-mail: hupc@hit.edu.cn; Ding, Xue-Mei, E-mail: X.M.Ding@outlook.com

    A fiber-coupled displacement measuring interferometer capable of determining of the posture of a reflective surface of a measuring mirror is proposed. The newly constructed instrument combines fiber-coupled displacement and angular measurement technologies. The proposed interferometer has advantages of both the fiber-coupled and the spatially beam-separated interferometer. A portable dual-position sensitive detector (PSD)-based unit within this proposed interferometer measures the parallelism of the two source beams to guide the fiber-coupling adjustment. The portable dual PSD-based unit measures not only the pitch and yaw of the retro-reflector but also measures the posture of the reflective surface. The experimental results of displacement calibrationmore » show that the deviations between the proposed interferometer and a reference one, Agilent 5530, at two different common beam directions are both less than ±35 nm, thus verifying the effectiveness of the beam parallelism measurement. The experimental results of angular calibration show that deviations of pitch and yaw with the auto-collimator (as a reference) are less than ±2 arc sec, thus proving the proposed interferometer’s effectiveness for determination of the posture of a reflective surface.« less

  1. THE SITE DEMONSTRATION OF CHEMFIX SOLIDIFICATION/ STABILIZATION PROCESS AT THE PORTABLE EQUIPMENT SALVAGE COMPANY SITE

    EPA Science Inventory

    A demonstration of the GHEMFIX solidification/stabilization process was conducted under the United States Environmental Protection Agency`s (EPA) Superfund Innovative Technology Evaluation (SITE) program. The demonstration was conducted in March 1989, at the Portable Equipment Sa...

  2. OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

    2010-10-01

    Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.

  3. A Tutorial on Parallel and Concurrent Programming in Haskell

    NASA Astrophysics Data System (ADS)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  4. Portable home hemodialysis for kidney failure.

    PubMed

    Scott, A

    2007-11-01

    (1) Home hemodialysis has been in limited use in Canada for some time. Newer, portable hemodialysis machines that are easier for patients to operate may encourage the uptake of this technology. (2) One portable system is already available in the US. The NxStage System One hemodialysis machine operates on standard electric current, does not require plumbing or specialized disinfection, and is small enough for patients to travel with. (3) It is not yet clear whether the use of the NxStage system improves long-term survival and quality of life. (4) Home hemodialysis is less costly than conventional in-centre programs, but it is unknown whether these savings extend to portable devices.

  5. Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.

    PubMed

    Cabezas, Javier; Gelado, Isaac; Stone, John E; Navarro, Nacho; Kirk, David B; Hwu, Wen-Mei

    2015-05-01

    Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.

  6. Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

    PubMed Central

    Cabezas, Javier; Gelado, Isaac; Stone, John E.; Navarro, Nacho; Kirk, David B.; Hwu, Wen-mei

    2014-01-01

    Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice. PMID:26180487

  7. Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

    NASA Technical Reports Server (NTRS)

    Overman, Andrea L.; Poole, Eugene L.

    1991-01-01

    A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.

  8. Medicine clerkships and portable computing: a national survey of internal medicine clerkship directors.

    PubMed

    Ferenchick, Gary; Solomon, David; Durning, Steven J

    2010-01-01

    Portable computers are widely used by medical trainees, but there is a lack of data on how these devices are used in clinical education programs. The objective is to define the current use of portable computing in internal medicine clerkships and to determine medicine clerkship directors' perceptions of the current value and future importance of portable computing. A 2006 national survey of institutional members of the Clerkship Directors in Internal Medicine. Eighty-three of 110 (75%) of institutional members responded. An institutional requirement for portable computing was reported by 32 schools (39%), whereas only 13 (16%) provided students with a portable computer. Between 10 and 31 institutions (12-37%) reported student use for patient care activities (i.e. order entry, writing patient notes) and only 2 to 4 institutions (2-5%) required such use. The majority of respondents (59-95%) reported portable computer use for educational activities (i.e., tracking patient problems, knowledge resource), however, only in 5 to 19 (6-23%) were such educational uses required. Fifty-six respondents (68%) reported that portable computer's "added value" for teaching and 61 (73%) reported that portable computers would be important in meeting clerkship objectives in the next 3 years. Of interest, even among the institutions requiring portable computers, only 50% recommended or required specific software. Portable computing is required at 39% of allopathic medical schools in the United States. However required portable computing for specific patient care or educational tasks is uncommon. In addition, guidance on specific software exists in only one half of school requiring portable computers, suggesting informal or unstructured uses of required portable computer's in the remaining half. The educational impact of formal institutional requirements for software versus informal "user-defined" applications is unknown.

  9. The Use of Portable Microcomputers to Collect Student and Teacher Behavior Data.

    ERIC Educational Resources Information Center

    Rieth, Herbert; And Others

    1989-01-01

    Using portable microcomputers, three applications programs were developed and implemented to collect, store, transmit, and analyze teacher/student observational data. The three applications involved: analyzing teaching behaviors of trainees in field-site placements, using microcomputers to educate mildly handicapped high-school students, and using…

  10. The Use of Cloud Technology in Athletic Training Education

    ERIC Educational Resources Information Center

    Perkey, Dennis

    2012-01-01

    As technology advances and becomes more portable, athletic training educators (ATEs) have many options available to them. Whether attempting to streamline efforts in courses, or operate a more efficient athletic training education program, portable technology is becoming an important tool that will assist the ATE. One tool that allows more…

  11. ESTE: Verification of Portable Optical and Thermal Imaging Devices for Leak Detection at Petroleum Refineries and Chemical Plants

    EPA Science Inventory

    This is an ESTE project summary brief. EPA’s Environmental Technology Verification Program (ETV) is verifying the performance of portable optical and thermal imaging devices for leak detection at petroleum refineries and chemical plans. Industrial facilities, such as chemical p...

  12. PETSc Users Manual Revision 3.7

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balay, Satish; Abhyankar, S.; Adams, M.

    This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.

  13. PETSc Users Manual Revision 3.8

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balay, S.; Abhyankar, S.; Adams, M.

    This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.

  14. Parallel computing in experimental mechanics and optical measurement: A review (II)

    NASA Astrophysics Data System (ADS)

    Wang, Tianyi; Kemao, Qian

    2018-05-01

    With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.

  15. Summary of the Ratfor language, an extended portable dialect called REP, its style and flavor, and details of its implementation on the PDP-10

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Figen, J.

    1981-09-10

    Ratfor (RATional FORtran) is a dialect of Fortran that allows structured programming and the use of simple macros. It is the language of the Software Tools package, and is documented in the book Software Tools. It has proved significantly easier than Fortran to read, write, and understand. Although debugging is slightly harder in Ratfor than in Fortran, there is usually less of it to do, since Ratfor lends itself to better program design. Ratfor operates as a preprocessor to any existing Fortran system. It is relatively easy, using Ratfor, to write programs that are portable with little or no changemore » to any environment that supports standard Fortran. REP (Ratfor Extended and Portable) is an extended version of Ratfor. It is fully upward compatible with the Addison-Wesley translator, though there are a few divergences from certain Unix and Software Tools User Group dialects. The principal feature of REP is its fully developed macro facility, a language within a language, capable of doing such things as creating new data types, data structuring, recursive procedures, and much more, portably, and in the spirit of Ratfor, but there are many other lesser though nevertheless important extensions.« less

  16. Facial emotion recognition system for autistic children: a feasible study based on FPGA implementation.

    PubMed

    Smitha, K G; Vinod, A P

    2015-11-01

    Children with autism spectrum disorder have difficulty in understanding the emotional and mental states from the facial expressions of the people they interact. The inability to understand other people's emotions will hinder their interpersonal communication. Though many facial emotion recognition algorithms have been proposed in the literature, they are mainly intended for processing by a personal computer, which limits their usability in on-the-move applications where portability is desired. The portability of the system will ensure ease of use and real-time emotion recognition and that will aid for immediate feedback while communicating with caretakers. Principal component analysis (PCA) has been identified as the least complex feature extraction algorithm to be implemented in hardware. In this paper, we present a detailed study of the implementation of serial and parallel implementation of PCA in order to identify the most feasible method for realization of a portable emotion detector for autistic children. The proposed emotion recognizer architectures are implemented on Virtex 7 XC7VX330T FFG1761-3 FPGA. We achieved 82.3% detection accuracy for a word length of 8 bits.

  17. A low cost, disposable cable-shaped Al-air battery for portable biosensors

    NASA Astrophysics Data System (ADS)

    Fotouhi, Gareth; Ogier, Caleb; Kim, Jong-Hoon; Kim, Sooyeun; Cao, Guozhong; Shen, Amy Q.; Kramlich, John; Chung, Jae-Hyun

    2016-05-01

    A disposable cable-shaped flexible battery is presented using a simple, low cost manufacturing process. The working principle of an aluminum-air galvanic cell is used for the cable-shaped battery to power portable and point-of-care medical devices. The battery is catalyzed with a carbon nanotube (CNT)-paper matrix. A scalable manufacturing process using a lathe is developed to wrap a paper layer and a CNT-paper matrix on an aluminum wire. The matrix is then wrapped with a silver-plated copper wire to form the battery cell. The battery is activated through absorption of electrolytes including phosphate-buffered saline, NaOH, urine, saliva, and blood into the CNT-paper matrix. The maximum electric power using a 10 mm-long battery cell is over 1.5 mW. As a demonstration, an LED is powered using two groups of four batteries in parallel connected in series. Considering the material composition and the cable-shaped configuration, the battery is fully disposable, flexible, and potentially compatible with portable biosensors through activation by either reagents or biological fluids.

  18. OpenSWPC: an open-source integrated parallel simulation code for modeling seismic wave propagation in 3D heterogeneous viscoelastic media

    NASA Astrophysics Data System (ADS)

    Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi

    2017-07-01

    We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.

  19. UPEML Version 2. 0: A machine-portable CDC Update emulator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mehlhorn, T.A.; Young, M.F.

    1987-05-01

    UPEML is a machine-portable CDC Update emulation program. UPEML is written in ANSI standard Fortran-77 and is relatively simple and compact. It is capable of emulating a significant subset of the standard CDC Update functions, including program library creation and subsequent modification. Machine-portability is an essential attribute of UPEML. UPEML was written primarily to facilitate the use of CDC-based scientific packages on alternate computer systems such as the VAX 11/780 and the IBM 3081. UPEML has also been successfully used on the multiprocessor ELXSI, on CRAYs under both COS and CTSS operating systems, on APOLLO workstations, and on the HP-9000.more » Version 2.0 includes enhanced error checking, full ASCI character support, a program library audit capability, and a partial update option in which only selected or modified decks are written to the compile file. Further enhancements include checks for overlapping corrections, processing of nested calls to common decks, and reads and addfiles from alternate input files.« less

  20. S-HARP: A parallel dynamic spectral partitioner

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sohn, A.; Simon, H.

    1998-01-01

    Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less

  1. Preliminary comparisons of portable near infrared (nir) instrumentation for laboratory measurements of cotton fiber micronaire

    USDA-ARS?s Scientific Manuscript database

    Micronaire is a key quality and processing parameter for cotton fiber. A program was implemented to determine the capabilities of portable Near Infrared (NIR) instrumentation to monitor cotton fiber micronaire both in the laboratory and in/near the field. Previous evaluations on one NIR unit demon...

  2. Video. A Guide to the Use of Portable Video Equipment.

    ERIC Educational Resources Information Center

    Rowatt, Robert W.

    This guide describes the portable equipment necessary for preparing a video production, and recommends ways of using that equipment to create a video program. Step by step instructions are provided for setting up the equipment for battery operation or with a mains electricity supply. Information is also given on procedures for recording, playing…

  3. Personal, Portable, Multifunction-Devices and School Libraries

    ERIC Educational Resources Information Center

    Weaver, Anne

    2010-01-01

    To maximise learning value from one-to-one programs in schools, computing devices need to be personal, portable and multifunctional. It is likely that shared devices will not be as effective. The increased access provided by one-to-one devices creates great opportunities for school librarians to support their school technology directions and to…

  4. CACREP Accreditation: A Solution to License Portability and Counselor Identity Problems

    ERIC Educational Resources Information Center

    Mascari, J. Barry; Webber, Jane

    2013-01-01

    A confluence of forces addressing counselor identity occurred with the 20/20: A Vision for the Future of Counseling initiative, the 2009 Standards of the Council for Accreditation of Counseling and Related Education Programs (CACREP), and the quest by the American Association of State Counseling Boards to establish license portability. This…

  5. Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Earl, Christopher; Might, Matthew; Bagusetty, Abhishek

    This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.

  6. Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

    DOE PAGES

    Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...

    2016-01-26

    This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.

  7. Identifying and quantifying short-lived fission products from thermal fission of HEU using portable HPGe detectors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pierson, Bruce D.; Finn, Erin C.; Friese, Judah I.

    2013-03-01

    Due to the emerging potential for trafficking of special nuclear material, research programs are investigating current capabilities of commercially available portable gamma ray detection systems. Presented in this paper are the results of three different portable high-purity germanium (HPGe) detectors used to identify short-lived fission products generated from thermal neutron interrogation of small samples of highly enriched uranium. Samples were irradiated at the Washington State University (WSU) Nuclear Radiation Center’s 1MW TRIGA reactor. The three portable, HPGe detectors used were the ORTEC MicroDetective, the ORTEC Detective, and the Canberra Falcon. Canberra’s GENIE-2000 software was used to analyze the spectral datamore » collected from each detector. Ultimately, these three portable detectors were able to identify a large range of fission products showing potential for material discrimination.« less

  8. A programmable and portable NMES device for drop foot correction and blood flow assist applications.

    PubMed

    Breen, Paul P; Corley, Gavin J; O'Keeffe, Derek T; Conway, Richard; Olaighin, Gearóid

    2009-04-01

    The Duo-STIM, a new, programmable and portable neuromuscular stimulation system for drop foot correction and blood flow assist applications is presented. The system consists of a programmer unit and a portable, programmable stimulator unit. The portable stimulator features fully programmable, sensor-controlled, constant-voltage, dual-channel stimulation and accommodates a range of customized stimulation profiles. Trapezoidal and free-form adaptive stimulation intensity envelope algorithms are provided for drop foot correction applications, while time dependent and activity dependent algorithms are provided for blood flow assist applications. A variety of sensor types can be used with the portable unit, including force sensitive resistor-based foot switches and MEMS-based accelerometer and gyroscope devices. The paper provides a detailed description of the hardware and block-level system design for both units. The programming and operating procedures for the system are also presented. Finally, functional bench test results for the system are presented.

  9. A programmable and portable NMES device for drop foot correction and blood flow assist applications.

    PubMed

    Breen, Paul P; Corley, Gavin J; O'Keeffe, Derek T; Conway, Richard; OLaighin, Gearoid

    2007-01-01

    The Duo-STIM, a new, programmable and portable neuromuscular stimulation system for drop foot correction and blood flow assist applications is presented. The system consists of a programmer unit and a portable, programmable stimulator unit. The portable stimulator features fully programmable, sensor-controlled, constant-voltage, dual-channel stimulation and accommodates a range of customized stimulation profiles. Trapezoidal and free-form adaptive stimulation intensity envelope algorithms are provided for drop foot correction applications, while time dependent and activity dependent algorithms are provided for blood flow assist applications. A variety of sensor types can be used with the portable unit, including force sensitive resistor based foot switches and NMES based accelerometer and gyroscope devices. The paper provides a detailed description of the hardware and block-level system design for both units. The programming and operating procedures for the system are also presented. Finally, functional bench test results for the system are presented.

  10. A portable battery for objective, non-obstrusive measures of human performances

    NASA Technical Reports Server (NTRS)

    Kennedy, R. S.

    1984-01-01

    The need for a standardized battery of human performance tests to measure the effects of various treatments is pointed out. Progress in such a program is reported. Three batteries are available which differ in length and the number of tests in the battery. All tests are implemented on a portable, lap held, briefcase size microprocessor. Performances measured include: information processing, memory, visual perception, reasoning, and motor skills, programs to determine norms, reliabilities, stabilities, factor structure of tests, comparisons with marker tests, apparatus suitability. Rationale for the battery is provided.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bonachea, Dan; Hargrove, P.

    GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages and libraries such as UPC, UPC++, Co-Array Fortran, Legion, Chapel, and many others. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking".

  12. Single Sided Messaging v. 0.6.6

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Curry, Matthew Leon; Farmer, Matthew Shane; Hassani, Amin

    Single-Sided Messaging (SSM) is a portable, multitransport networking library that enables applications to leverage potential one-sided capabilities of underlying network transports. It also provides desirable semantics that services for highperformance, massively parallel computers can leverage, such as an explicit cancel operation for pending transmissions, as well as enhanced matching semantics favoring large numbers of buffers attached to a single match entry. This release supports TCP/IP, shared memory, and Infiniband.

  13. Component-based integration of chemistry and optimization software.

    PubMed

    Kenny, Joseph P; Benson, Steven J; Alexeev, Yuri; Sarich, Jason; Janssen, Curtis L; McInnes, Lois Curfman; Krishnan, Manojkumar; Nieplocha, Jarek; Jurrus, Elizabeth; Fahlstrom, Carl; Windus, Theresa L

    2004-11-15

    Typical scientific software designs make rigid assumptions regarding programming language and data structures, frustrating software interoperability and scientific collaboration. Component-based software engineering is an emerging approach to managing the increasing complexity of scientific software. Component technology facilitates code interoperability and reuse. Through the adoption of methodology and tools developed by the Common Component Architecture Forum, we have developed a component architecture for molecular structure optimization. Using the NWChem and Massively Parallel Quantum Chemistry packages, we have produced chemistry components that provide capacity for energy and energy derivative evaluation. We have constructed geometry optimization applications by integrating the Toolkit for Advanced Optimization, Portable Extensible Toolkit for Scientific Computation, and Global Arrays packages, which provide optimization and linear algebra capabilities. We present a brief overview of the component development process and a description of abstract interfaces for chemical optimizations. The components conforming to these abstract interfaces allow the construction of applications using different chemistry and mathematics packages interchangeably. Initial numerical results for the component software demonstrate good performance, and highlight potential research enabled by this platform.

  14. High performance in silico virtual drug screening on many-core processors.

    PubMed

    McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A

    2015-05-01

    Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.

  15. High performance in silico virtual drug screening on many-core processors

    PubMed Central

    Price, James; Sessions, Richard B; Ibarra, Amaurys A

    2015-01-01

    Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727

  16. SSRL Emergency Response Shore Tool

    NASA Technical Reports Server (NTRS)

    Mah, Robert W.; Papasin, Richard; McIntosh, Dawn M.; Denham, Douglas; Jorgensen, Charles; Betts, Bradley J.; Del Mundo, Rommel

    2006-01-01

    The SSRL Emergency Response Shore Tool (wherein SSRL signifies Smart Systems Research Laboratory ) is a computer program within a system of communication and mobile-computing software and hardware being developed to increase the situational awareness of first responders at building collapses. This program is intended for use mainly in planning and constructing shores to stabilize partially collapsed structures. The program consists of client and server components, runs in the Windows operating system on commercial off-the-shelf portable computers, and can utilize such additional hardware as digital cameras and Global Positioning System devices. A first responder can enter directly, into a portable computer running this program, the dimensions of a required shore. The shore dimensions, plus an optional digital photograph of the shore site, can then be uploaded via a wireless network to a server. Once on the server, the shore report is time-stamped and made available on similarly equipped portable computers carried by other first responders, including shore wood cutters and an incident commander. The staff in a command center can use the shore reports and photographs to monitor progress and to consult with structural engineers to assess whether a building is in imminent danger of further collapse.

  17. NLSEmagic: Nonlinear Schrödinger equation multi-dimensional Matlab-based GPU-accelerated integrators using compact high-order schemes

    NASA Astrophysics Data System (ADS)

    Caplan, R. M.

    2013-04-01

    We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schrödinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and are much cheaper to run than on standard parallel clusters. The codes are developed with usability and portability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with the MEX-compiler interface. The packages are freely distributed, including user manuals and set-up files. Catalogue identifier: AEOJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOJ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 124453 No. of bytes in distributed program, including test data, etc.: 4728604 Distribution format: tar.gz Programming language: C, CUDA, MATLAB. Computer: PC, MAC. Operating system: Windows, MacOS, Linux. Has the code been vectorized or parallelized?: Yes. Number of processors used: Single CPU, number of GPU processors dependent on chosen GPU card (max is currently 3072 cores on GeForce GTX 690). Supplementary material: Setup guide, Installation guide. RAM: Highly dependent on dimensionality and grid size. For typical medium-large problem size in three dimensions, 4GB is sufficient. Keywords: Nonlinear Schröodinger Equation, GPU, high-order finite difference, Bose-Einstien condensates. Classification: 4.3, 7.7. Nature of problem: Integrate solutions of the time-dependent one-, two-, and three-dimensional cubic nonlinear Schrödinger equation. Solution method: The integrators utilize a fully-explicit fourth-order Runge-Kutta scheme in time and both second- and fourth-order differencing in space. The integrators are written to run on NVIDIA GPUs and are interfaced with MATLAB including built-in visualization and analysis tools. Restrictions: The main restriction for the GPU integrators is the amount of RAM on the GPU as the code is currently only designed for running on a single GPU. Unusual features: Ability to visualize real-time simulations through the interaction of MATLAB and the compiled GPU integrators. Additional comments: Setup guide and Installation guide provided. Program has a dedicated web site at www.nlsemagic.com. Running time: A three-dimensional run with a grid dimension of 87×87×203 for 3360 time steps (100 non-dimensional time units) takes about one and a half minutes on a GeForce GTX 580 GPU card.

  18. Parallel programming with Easy Java Simulations

    NASA Astrophysics Data System (ADS)

    Esquembre, F.; Christian, W.; Belloni, M.

    2018-01-01

    Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

  19. A Research Program in Computer Technology. Volume 1

    DTIC Science & Technology

    1981-08-01

    rigidity, sensor networks 10. command and control, digital voice communication, graphic input device for terminal, multimedia communications, portable...satellite channel in the internetwork environment; Distributed Sensor Networks - formulation of algorithms and communication protocols to support the...operation of geographically distributed sensors ; Personal Communicator - work intended to result in a demonstration-level portable terminal to test and

  20. Genetic Parallel Programming: design and implementation.

    PubMed

    Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

    2006-01-01

    This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

  1. Portable Computer Technology (PCT) Research and Development Program Phase 2

    NASA Technical Reports Server (NTRS)

    Castillo, Michael; McGuire, Kenyon; Sorgi, Alan

    1995-01-01

    The subject of this project report, focused on: (1) Design and development of two Advanced Portable Workstation 2 (APW 2) units. These units incorporate advanced technology features such as a low power Pentium processor, a high resolution color display, National Television Standards Committee (NTSC) video handling capabilities, a Personal Computer Memory Card International Association (PCMCIA) interface, and Small Computer System Interface (SCSI) and ethernet interfaces. (2) Use these units to integrate and demonstrate advanced wireless network and portable video capabilities. (3) Qualification of the APW 2 systems for use in specific experiments aboard the Mir Space Station. A major objective of the PCT Phase 2 program was to help guide future choices in computing platforms and techniques for meeting National Aeronautics and Space Administration (NASA) mission objectives. The focus being on the development of optimal configurations of computing hardware, software applications, and network technologies for use on NASA missions.

  2. Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

    1999-01-01

    The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.

  3. A geostationary satellite system for mobile multimedia applications using portable, aeronautical and mobile terminals

    NASA Technical Reports Server (NTRS)

    Losquadro, G.; Luglio, M.; Vatalaro, F.

    1997-01-01

    A geostationary satellite system for mobile multimedia services via portable, aeronautical and mobile terminals was developed within the framework of the Advanced Communications Technology Service (ACTS) programs. The architecture of the system developed under the 'satellite extremely high frequency communications for multimedia mobile services (SECOMS)/ACTS broadband aeronautical terminal experiment' (ABATE) project is presented. The system will be composed of a Ka band system component, and an extremely high frequency band component. The major characteristics of the space segment, the ground control station and the portable, aeronautical and mobile user terminals are outlined.

  4. ms2: A molecular simulation tool for thermodynamic properties

    NASA Astrophysics Data System (ADS)

    Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

    2011-11-01

    This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.

  5. System and method for acquisition management of subject position information

    DOEpatents

    Carrender, Curt

    2005-12-13

    A system and method for acquisition management of subject position information that utilizes radio frequency identification (RF ID) to store position information in position tags. Tag programmers receive position information from external positioning systems, such as the Global Positioning System (GPS), from manual inputs, such as keypads, or other tag programmers. The tag programmers program each position tag with the received position information. Both the tag programmers and the position tags can be portable or fixed. Implementations include portable tag programmers and fixed position tags for subject position guidance, and portable tag programmers for collection sample labeling. Other implementations include fixed tag programmers and portable position tags for subject route recordation. Position tags can contain other associated information such as destination address of an affixed subject for subject routing.

  6. System and method for acquisition management of subject position information

    DOEpatents

    Carrender, Curt [Morgan Hill, CA

    2007-01-23

    A system and method for acquisition management of subject position information that utilizes radio frequency identification (RF ID) to store position information in position tags. Tag programmers receive position information from external positioning systems, such as the Global Positioning System (GPS), from manual inputs, such as keypads, or other tag programmers. The tag programmers program each position tag with the received position information. Both the tag programmers and the position tags can be portable or fixed. Implementations include portable tag programmers and fixed position tags for subject position guidance, and portable tag programmers for collection sample labeling. Other implementations include fixed tag programmers and portable position tags for subject route recordation. Position tags can contain other associated information such as destination address of an affixed subject for subject routing.

  7. Evaluation and Research Program for the Portable Braille Recorder (PBR). Volume III. Appendix 19, Digicassette Operating Manual.

    ERIC Educational Resources Information Center

    Bourgeois, Michelle; And Others

    The operating manual for the Digicassette (or Portable Braille Recorder--PBR), an electronic braille reading and writing machine and an audiotape recorder which is compact and easy to carry around, is presented. Following a preface and an introduction are sections which address the following topics: orientation to parts and functions of the…

  8. A review of digital microfluidics as portable platforms for lab-on a-chip applications.

    PubMed

    Samiei, Ehsan; Tabrizian, Maryam; Hoorfar, Mina

    2016-07-07

    Following the development of microfluidic systems, there has been a high tendency towards developing lab-on-a-chip devices for biochemical applications. A great deal of effort has been devoted to improve and advance these devices with the goal of performing complete sets of biochemical assays on the device and possibly developing portable platforms for point of care applications. Among the different microfluidic systems used for such a purpose, digital microfluidics (DMF) shows high flexibility and capability of performing multiplex and parallel biochemical operations, and hence, has been considered as a suitable candidate for lab-on-a-chip applications. In this review, we discuss the most recent advances in the DMF platforms, and evaluate the feasibility of developing multifunctional packages for performing complete sets of processes of biochemical assays, particularly for point-of-care applications. The progress in the development of DMF systems is reviewed from eight different aspects, including device fabrication, basic fluidic operations, automation, manipulation of biological samples, advanced operations, detection, biological applications, and finally, packaging and portability of the DMF devices. Success in developing the lab-on-a-chip DMF devices will be concluded based on the advances achieved in each of these aspects.

  9. Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

    NASA Astrophysics Data System (ADS)

    Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

    2017-12-01

    As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.

  10. A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth

    2005-03-15

    The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less

  11. Mobile and portable dental services catering to the basic oral health needs of the underserved population in developing countries: a proposed model.

    PubMed

    Ganavadiya, R; Chandrashekar, Br; Goel, P; Hongal, Sg; Jain, M

    2014-05-01

    India is the second most populous country in the world with an extensive rural population (68.8%). Children less than 18 years constitute about 40% of the population. Approximately, 23.5% of the urban population resides in urban slums. The extensive rural population, school children and the urban slum dwellers are denied of even the basic dental services though there is continuous advancement in the field of dentistry. The dentist to population ratio has dramatically improved in the last one to two decades with no significant improvement in the oral health status of the general population. The various studies have revealed an increasing trend in oral diseases in the recent times especially among this underserved population. Alternate strategies have to be thought about rather than the traditional oral health-care delivery through private dentists on fee for service basis. Mobile and portable dental services are a viable option to take the sophisticated oral health services to the doorsteps of the underserved population. The databases were searched for publications from 1900 to the present (2013) using terms such as Mobile dental services, Portable dental services and Mobile and portable dental services with key articles obtained primarily from MEDLINE. This paper reviews the published and unpublished literature from different sources on the various mobile dental service programs successfully implemented in some developed and developing countries. Though the mobile and portable systems have some practical difficulties like financial considerations, they still seem to be the only way to reach every section of the community in the absence of national oral health policy and organized school dental health programs in India. The material for the present review was obtained mainly by searching the biomedical databases for primary research material using the search engine with key words such as mobile and/or portable dental services in developed and developing countries (adding each of these terms in a sequential order). Based on the review of the programs successfully implemented in developed countries, we propose a model to cater to the basic oral health needs of an extensive underserved population in India that may be pilot tested. The increasing dental manpower can best be utilized for the promotion of oral health through mobile and portable dental services. The professional dental organizations should have a strong motive to translate this into reality.

  12. Virtual environment display for a 3D audio room simulation

    NASA Astrophysics Data System (ADS)

    Chapin, William L.; Foster, Scott

    1992-06-01

    Recent developments in virtual 3D audio and synthetic aural environments have produced a complex acoustical room simulation. The acoustical simulation models a room with walls, ceiling, and floor of selected sound reflecting/absorbing characteristics and unlimited independent localizable sound sources. This non-visual acoustic simulation, implemented with 4 audio ConvolvotronsTM by Crystal River Engineering and coupled to the listener with a Poihemus IsotrakTM, tracking the listener's head position and orientation, and stereo headphones returning binaural sound, is quite compelling to most listeners with eyes closed. This immersive effect should be reinforced when properly integrated into a full, multi-sensory virtual environment presentation. This paper discusses the design of an interactive, visual virtual environment, complementing the acoustic model and specified to: 1) allow the listener to freely move about the space, a room of manipulable size, shape, and audio character, while interactively relocating the sound sources; 2) reinforce the listener's feeling of telepresence into the acoustical environment with visual and proprioceptive sensations; 3) enhance the audio with the graphic and interactive components, rather than overwhelm or reduce it; and 4) serve as a research testbed and technology transfer demonstration. The hardware/software design of two demonstration systems, one installed and one portable, are discussed through the development of four iterative configurations. The installed system implements a head-coupled, wide-angle, stereo-optic tracker/viewer and multi-computer simulation control. The portable demonstration system implements a head-mounted wide-angle, stereo-optic display, separate head and pointer electro-magnetic position trackers, a heterogeneous parallel graphics processing system, and object oriented C++ program code.

  13. ITA, a portable program for the interactive analysis of data from tracer experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wootton, R.; Ashley, K.

    ITA is a portable program for analyzing data from tracer experiments, most of the mathematical and graphical work being carried out by subroutines from the NAG and DASL libraries. The program can be used in batch or interactive mode, commands being typed in an English-like language, in free format. Data can be entered from a terminal keyboard or read from a file, and can be validated by printing or plotting them. Erroneous values can be corrected by appropriate editing. Analysis can involve elementary statistics, multiple-isotope crossover corrections, convolution or deconvolution, polyexponential curve-fitting, spline interpolation and/or compartmental analysis. On those installationsmore » with the appropriate hardware, high-resolution graphs can be drawn.« less

  14. Portable long trace profiler: Concept and solution

    NASA Astrophysics Data System (ADS)

    Qian, Shinan; Takacs, Peter; Sostero, Giovanni; Cocco, Daniele

    2001-08-01

    Since the early development of the penta-prism long trace profiler (LTP) and the in situ LTP, and following the completion of the first in situ distortion profile measurements at Sincrotrone Trieste (ELETTRA) in Italy in 1995, a concept was developed for a compact, portable LTP with the following characteristics: easily installed on synchrotron radiation beam lines, easily carried to different laboratories around the world for measurements and calibration, convenient for use in evaluating the LTP as an in-process tool in the optical workshop, and convenient for use in temporarily installation as required by other special applications. The initial design of a compact LTP optical head was made at ELETTRA in 1995. Since 1997 further efforts to reduce the optical head size and weight, and to improve measurement stability have been made at Brookhaven National Laboratory. This article introduces the following solutions and accomplishments for the portable LTP: (1) a new design for a compact and very stable optical head, (2) the use of a small detector connected to a laptop computer directly via an enhanced parallel port, and there is no extra frame grabber interface and control box, (3) a customized small mechanical slide that uses a compact motor with a connector-sized motor controller, and (4) the use of a laptop computer system. These solutions make the portable LTP able to be packed into two laptop-size cases: one for the computer and one for the rest of the system.

  15. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    NASA Astrophysics Data System (ADS)

    Dey, T.; Rodrigue, P.

    2015-07-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.

  16. The BASIC Instructional Program: Conversion into MAINSAIL Language.

    ERIC Educational Resources Information Center

    Dageforde, Mary L.

    This report summarizes the rewriting of the BASIC Instructional Program (BIP) (a "hands-on laboratory" that teaches elementary programming in the BASIC language) from SAIL (a programming language available only on PDP-10 computers) into MAINSAIL (a language designed for portability on a broad class of computers). Four sections contain…

  17. A gas flow indicator for portable life support systems

    NASA Technical Reports Server (NTRS)

    Bass, R. L., III; Schroeder, E. C.

    1975-01-01

    A three-part program was conducted to develop a gas flow indicator (GFI) to monitor ventilation flow in a portable life support system. The first program phase identified concepts which could potentially meet the GFI requirements. In the second phase, a working breadboard GFI, based on the concept of a pressure sensing diaphragm-aneroid assembly connected to a venturi, was constructed and tested. Extensive testing of the breadboard GFI indicated that the design would meet all NASA requirements including eliminating problems experienced with the ventilation flow sensor used in the Apollo program. In the third program phase, an optimized GFI was designed by utilizing test data obtained on the breadboard unit. A prototype unit was constructed using prototype materials and fabrication techniques, and performance tests indicated that the prototype GFI met or exceeded all requirements.

  18. Advanced Technology for Portable Personal Visualization.

    DTIC Science & Technology

    1992-06-01

    interactive radiosity . 6 Advanced Technology for Portable Personal Visualization Progress Report January-June 1992 9 2.5 Virtual-Environment Ultrasound...the system, with support for textures, model partitioning, more complex radiosity emitters, and the replacement of model parts with objects from our...model libraries. "* Add real-time, interactive radiosity to the display program on Pixel-Planes 5. "* Move the real-time model mesh-generation to the

  19. U.S.-MEXICO BORDER PROGRAM ARIZONA BORDER STUDY--STANDARD OPERATING PROCEDURE FOR OPERATION, CALIBRATION, AND ROUTINE USE OF THE SPECTRACE 9000 FIELD PORTABLE X-RAY FLUORESCENCE ANALYZER (UA-L-10.1)

    EPA Science Inventory

    The purpose of this SOP is to describe the procedures for operating and calibrating the Spectrace 9000 field portable X-ray fluorescence analyzer. This procedure applies to the determination of metal concentrations in samples during the Arizona NHEXAS project and the Border stud...

  20. Partitioning problems in parallel, pipelined and distributed computing

    NASA Technical Reports Server (NTRS)

    Bokhari, S.

    1985-01-01

    The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.

  1. Towards a Modular, Robust, and Portable Sensing Platform for Biological and Point of Care Diagnostics

    DTIC Science & Technology

    2013-07-01

    drugs at potentially very low concentrations in a variety of complex media such as saliva, blood, urine , and other bodily fluids. These handheld...flow assays, such as pregnancy tests, disease and drug abuse screens, and blood protein markers, exhibit widespread use. Recent parallel advances...gadgets have the potential to lower the cost of diagnosis and save immense amounts of time by removing the need to collect, preserve, and ship samples

  2. Lead zirconate titanate nanowire textile nanogenerator for wearable energy-harvesting and self-powered devices.

    PubMed

    Wu, Weiwei; Bai, Suo; Yuan, Miaomiao; Qin, Yong; Wang, Zhong Lin; Jing, Tao

    2012-07-24

    Wearable nanogenerators are of vital importance to portable energy-harvesting and personal electronics. Here we report a method to synthesize a lead zirconate titanate textile in which nanowires are parallel with each other and a procedure to make it into flexible and wearable nanogenerators. The nanogenerator can generate 6 V output voltage and 45 nA output current, which are large enough to power a liquid crystal display and a UV sensor.

  3. Birth of the Program for Array Seismic Studies of the Continental Lithosphere (PASSCAL)

    NASA Astrophysics Data System (ADS)

    James, D. E.; Sacks, I. S.

    2002-05-01

    As recently as 1984 institutions doing portable seismology depended upon their own complement of instruments, almost all designed and built in-house, and all of limited recording capability and flexibility. No data standards existed. Around 1980 the National Research Council (NRC) of the National Academy of Sciences (NAS), with National Science Foundation (NSF) support, empanelled a committee to study a major new initiative in Seismic Studies of the Continental Lithosphere (SSCL). The SSCL report in 1983 recommended that substantial numbers (1000 or more) of new generation digital seismographs be acquired for 3-D high resolution imaging of the continental lithosphere. Recommendations of the SSCL committee dovetailed with other NRC/NAS and NSF reports that highlighted imaging of the continental lithosphere as an area of highest priority. For the first time in the history of portable seismology the question asked was "What do seismologists need to do the job right?" A grassroots effort was undertaken to define instrumentation and data standards for a powerful new set of modern seismic research tools to serve the national seismological community. In the spring and fall of 1983 NSF and IASPEI sponsored workshops were convened to develop specifications for the design of a new generation of portable instrumentation. PASSCAL was the outgrowth of these seminal studies and workshops. The first step toward the formal formation of PASSCAL began with an ad-hoc organizing committee, comprised largely of the members of the NAS lithospheric seismology panel, convened by the authors at Carnegie Institution in Washington in November 1983. From that meeting emerged plans and promises of NSF support for an open organizational meeting to be held in January 1984, in Madison, Wisconsin. By the end of the two-day Madison meeting PASSCAL and an official consortium of seismological institutions for portable seismology were realities. Shortly after, PASSCAL merged with the complementary Global Seismic Network (GSN) under the overall umbrella of the Incorporated Research Institutions for Seismology (IRIS) consortium. Pre-startup funding for PASSCAL was provided by NSF via a so-called "Phase Zero" grant to the Carnegie Institution in June, 1984, to initiate design of new digital instrumentation and to facilitate preparation of the PASSCAL Program Plan. A working group met at Princeton in July 1984 to draft the PASSCAL Program Plan for the IRIS 10-year proposal to NSF, submitted in December 1984. PASSCAL functions as a national facility for seismological research, acquiring and maintaining a large complement of state-of-the-art portable instrumentation for scientists in member institutions. Within a year of its formation, PASSCAL had retained an engineer/program manager and begun the specification process for the manufacture and acquisition of a national instrumentation facility of broadband and short period seismographs. Instrument centers staffed by hardware and software engineers were established to maintain and distribute equipment, and to assist in field installations. By the late 1980s substantial volumes of standardized digital data were flowing from portable experiments to the archives of the newly formed Data Management Center (DMC). Portable broadband sensors built to PASSCAL specifications came on the market in 1989 and transformed the nature of portable experiments by expanding the technical capabilities of portable stations almost to the level of permanent global stations. Today PASSCAL through the instrument center at New Mexico Tech supports dozens of experiments worldwide for high resolution imaging of the earth's interior on all scales.

  4. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

    2001-01-01

    This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.

  5. Skylab astronaut life support assembly

    NASA Technical Reports Server (NTRS)

    Brown, J. T.

    1972-01-01

    A comparative study was performed to define an optimum portable life support system for suited operations inside and outside the Skylab Program. Emphasis was placed on utilization of qualified equipment, modified versions of qualified equipment, and new systems made up to state-of-the-art components. Outlined are the mission constraints, operational modes, and evaluation ground rules by which the Skylab portable life support system was selected and the resulting design.

  6. Self-Learning through Programmed Learning in Distance Mode.

    ERIC Educational Resources Information Center

    Rao, D. Prakasa; Reddy, B. Sudhakar

    2002-01-01

    Presents the characteristics and development of self-learning material (SLM) in distance education. Discusses teaching with programmed learning; structure of SLM; and how SLM helps in self-study. Discusses the advantages of print materials as accompanying programmed instruction, because they are portable, well-structured, compact, and easily…

  7. Little Package, Big Deal.

    ERIC Educational Resources Information Center

    Campbell, Joseph K.

    1979-01-01

    Describes New York State's extension experience in using the programable calculator, a portable pocket-size computer, to solve many of the problems that central computers now handle. Subscription services to programs written for the Texas Instruments TI-59 programable calculator are provided by both Cornell and Iowa State Universities. (MF)

  8. A Data Parallel Multizone Navier-Stokes Code

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

    1995-01-01

    We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.

  9. Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density

    NASA Astrophysics Data System (ADS)

    Hohl, A.; Delmelle, E. M.; Tang, W.

    2015-07-01

    Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.

  10. ZENO: N-body and SPH Simulation Codes

    NASA Astrophysics Data System (ADS)

    Barnes, Joshua E.

    2011-02-01

    The ZENO software package integrates N-body and SPH simulation codes with a large array of programs to generate initial conditions and analyze numerical simulations. Written in C, the ZENO system is portable between Mac, Linux, and Unix platforms. It is in active use at the Institute for Astronomy (IfA), at NRAO, and possibly elsewhere. Zeno programs can perform a wide range of simulation and analysis tasks. While many of these programs were first created for specific projects, they embody algorithms of general applicability and embrace a modular design strategy, so existing code is easily applied to new tasks. Major elements of the system include: Structured data file utilities facilitate basic operations on binary data, including import/export of ZENO data to other systems.Snapshot generation routines create particle distributions with various properties. Systems with user-specified density profiles can be realized in collisionless or gaseous form; multiple spherical and disk components may be set up in mutual equilibrium.Snapshot manipulation routines permit the user to sift, sort, and combine particle arrays, translate and rotate particle configurations, and assign new values to data fields associated with each particle.Simulation codes include both pure N-body and combined N-body/SPH programs: Pure N-body codes are available in both uniprocessor and parallel versions.SPH codes offer a wide range of options for gas physics, including isothermal, adiabatic, and radiating models. Snapshot analysis programs calculate temporal averages, evaluate particle statistics, measure shapes and density profiles, compute kinematic properties, and identify and track objects in particle distributions.Visualization programs generate interactive displays and produce still images and videos of particle distributions; the user may specify arbitrary color schemes and viewing transformations.

  11. Architecture Adaptive Computing Environment

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    2006-01-01

    Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.

  12. A Portable Parallel Implementation of the U.S. Navy Layered Ocean Model

    DTIC Science & Technology

    1995-01-01

    Wallcraft, PhD (I.C. 1981) Planning Systems Inc. & P. R. Moore, PhD (Camb. 1971) IC Dept. Math. DR Moore 1° Encontro de Metodos Numericos...Kendall Square, Hypercube, D R Moore 1 ° Encontro de Metodos Numericos para Equacöes de Derivadas Parciais A. J. Wallcraft IC Mathematics...chips: Chips Machine DEC Alpha CrayT3D/E SUN Sparc Fujitsu AP1000 Intel 860 Paragon D R Moore 1° Encontro de Metodos Numericos para Equacöes

  13. Social media in dermatology: moving to Web 2.0.

    PubMed

    Travers, Robin L

    2012-09-01

    Patient use of social media platforms for accessing medical information has accelerated in parallel with overall use of the Internet. Dermatologists must keep pace with our patients' use of these media through either passive or active means are outlined in detail for 4 specific social media outlets. A 5-step plan for active engagement in social media applications is presented. Implications for medical professionalism, Health Insurance Portability and Accountability Act compliance, and crisis management are discussed. Copyright © 2012. Published by Elsevier Inc.

  14. DepositScan, a Scanning Program to Measure Spray Deposition Distributions

    USDA-ARS?s Scientific Manuscript database

    DepositScan, a scanning program was developed to quickly measure spray deposit distributions on water sensitive papers or Kromekote cards which are widely used for determinations of pesticide spray deposition quality on target areas. The program is installed in a portable computer and works with a ...

  15. Evolution of a Family Nurse Practitioner Program to Improve Primary Care Distribution

    ERIC Educational Resources Information Center

    Andrus, Len Hughes; Fenley, Mary D.

    1976-01-01

    Describes a Family Nurse Practitioner Program that has effectively improved the distribution of primary health care manpower in rural areas. Program characteristics include selection of personnel from areas of need, decentralization of clinical and didactic training sites, competency-based portable curriculum, and circuit-riding institutionally…

  16. Mobile and Portable Dental Services Catering to the Basic Oral Health Needs of the Underserved Population in Developing Countries: A Proposed Model

    PubMed Central

    Ganavadiya, R; Chandrashekar, BR; Goel, P; Hongal, SG; Jain, M

    2014-01-01

    India is the second most populous country in the world with an extensive rural population (68.8%). Children less than 18 years constitute about 40% of the population. Approximately, 23.5% of the urban population resides in urban slums. The extensive rural population, school children and the urban slum dwellers are denied of even the basic dental services though there is continuous advancement in the field of dentistry. The dentist to population ratio has dramatically improved in the last one to two decades with no significant improvement in the oral health status of the general population. The various studies have revealed an increasing trend in oral diseases in the recent times especially among this underserved population. Alternate strategies have to be thought about rather than the traditional oral health-care delivery through private dentists on fee for service basis. Mobile and portable dental services are a viable option to take the sophisticated oral health services to the doorsteps of the underserved population. The databases were searched for publications from 1900 to the present (2013) using terms such as Mobile dental services, Portable dental services and Mobile and portable dental services with key articles obtained primarily from MEDLINE. This paper reviews the published and unpublished literature from different sources on the various mobile dental service programs successfully implemented in some developed and developing countries. Though the mobile and portable systems have some practical difficulties like financial considerations, they still seem to be the only way to reach every section of the community in the absence of national oral health policy and organized school dental health programs in India. The material for the present review was obtained mainly by searching the biomedical databases for primary research material using the search engine with key words such as mobile and/or portable dental services in developed and developing countries (adding each of these terms in a sequential order). Based on the review of the programs successfully implemented in developed countries, we propose a model to cater to the basic oral health needs of an extensive underserved population in India that may be pilot tested. The increasing dental manpower can best be utilized for the promotion of oral health through mobile and portable dental services. The professional dental organizations should have a strong motive to translate this into reality. PMID:24971198

  17. Stand-Sit Microchip for High-Throughput, Multiplexed Analysis of Single Cancer Cells.

    PubMed

    Ramirez, Lisa; Herschkowitz, Jason I; Wang, Jun

    2016-09-01

    Cellular heterogeneity in function and response to therapeutics has been a major challenge in cancer treatment. The complex nature of tumor systems calls for the development of advanced multiplexed single-cell tools that can address the heterogeneity issue. However, to date such tools are only available in a laboratory setting and don't have the portability to meet the needs in point-of-care cancer diagnostics. Towards that application, we have developed a portable single-cell system that is comprised of a microchip and an adjustable clamp, so on-chip operation only needs pipetting and adjusting of clamping force. Up to 10 proteins can be quantitated from each cell with hundreds of single-cell assays performed in parallel from one chip operation. We validated the technology and analyzed the oncogenic signatures of cancer stem cells by quantitating both aldehyde dehydrogenase (ALDH) activities and 5 signaling proteins in single MDA-MB-231 breast cancer cells. The technology has also been used to investigate the PI3K pathway activities of brain cancer cells expressing mutant epidermal growth factor receptor (EGFR) after drug intervention targeting EGFR signaling. Our portable single-cell system will potentially have broad application in the preclinical and clinical settings for cancer diagnosis in the future.

  18. Capillary whole blood testing by a new portable monitor. Comparison with standard determination of the international normalized ratio.

    PubMed

    de Miguel, Dunia; Burgaleta, Carmen; Reyes, Eduardo; Pascual, Teresa

    2003-07-01

    We evaluated a new portable monitor (AvoSure PT PRO, Menarini Diagnostics, Firenze, Italy) developed to test the prothrombin time in capillary blood and plasma by comparing it with the standard laboratory determination. We studied 62 patients receiving acenocoumarol therapy. The international normalized ratio (INR) in capillary blood was analyzed by 2 methods: AvoSure PT PRO and Thrombotrack Nycomed Analyzer (Axis-Shield, Dundee, Scotland). Parallel studies were performed in plasma samples by a reference method using the Behring Coagulation Timer (Behring Diagnostics, Marburg, Germany). Plasma samples also were tested with the AvoSure PT PRO. Correlation was good for INR values for capillary blood and plasma samples by AvoSure PT PRO and our reference method (R2 = 0.8596) and for capillary blood samples tested by the AvoSure PT PRO and Thrombotrack Nycomed Analyzer (R2 = 0.8875). The correlation for INR in capillary blood and plasma samples by AvoSure PT PRO was 0.6939 (P < .0004). Capillary blood determinations are rapid and effective for monitoring oral anticoagulation therapy and have a high correlation to plasma determinations. AvoSure PT PRO is accurate for controlling INR in plasma and capillary blood samples, may be used in outpatient clinics, and has advantages over previous portable monitors.

  19. Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

    NASA Astrophysics Data System (ADS)

    Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

    We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.

  20. An object-oriented approach to nested data parallelism

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Chatterjee, Siddhartha

    1994-01-01

    This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

  1. Ruggedized Portable Instrumentation Package for Marine Mammal Evoked Potential Hearing Measurements (DURIP)

    DTIC Science & Technology

    2011-09-30

    NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Hawaii Institute of Marine Biology,Marine Mammal...Program Hawaii Institute of Marine Biology P.O. Box 1106 Kailua, Hawaii 96734 phone: (808) 247-5297 fax: (808) 247-5831 email: nachtiga...portable equipment was used at the Hilo Stranding Center to obtain the audiograms. Ready and available for RIMPAC excercises. 2 IMPACT

  2. Portable Oxygen Subsystem (POS). [for space shuttles

    NASA Technical Reports Server (NTRS)

    1975-01-01

    Concept selection, design, fabrication, and testing of a Portable Subsystem (POS) for use in space shuttle operations are described. Tradeoff analyses were conducted to determine the POS concept for fabrication and testing. The fabricated POS was subjected to unmanned and manned tests to verify compliance with statement of work requirements. The POS used in the development program described herein met requirements for the three operational modes -- prebreathing, contaminated cabin, and personnel rescue system operations.

  3. Analysis of Java Client/Server and Web Programming Tools for Development of Educational Systems.

    ERIC Educational Resources Information Center

    Muldner, Tomasz

    This paper provides an analysis of old and new programming tools for development of client/server programs, particularly World Wide Web-based programs. The focus is on development of educational systems that use interactive shared workspaces to provide portable and expandable solutions. The paper begins with a short description of relevant terms.…

  4. Adapting high-level language programs for parallel processing using data flow

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  5. APINetworks Java. A Java approach to the efficient treatment of large-scale complex networks

    NASA Astrophysics Data System (ADS)

    Muñoz-Caro, Camelia; Niño, Alfonso; Reyes, Sebastián; Castillo, Miriam

    2016-10-01

    We present a new version of the core structural package of our Application Programming Interface, APINetworks, for the treatment of complex networks in arbitrary computational environments. The new version is written in Java and presents several advantages over the previous C++ version: the portability of the Java code, the easiness of object-oriented design implementations, and the simplicity of memory management. In addition, some additional data structures are introduced for storing the sets of nodes and edges. Also, by resorting to the different garbage collectors currently available in the JVM the Java version is much more efficient than the C++ one with respect to memory management. In particular, the G1 collector is the most efficient one because of the parallel execution of G1 and the Java application. Using G1, APINetworks Java outperforms the C++ version and the well-known NetworkX and JGraphT packages in the building and BFS traversal of linear and complete networks. The better memory management of the present version allows for the modeling of much larger networks.

  6. Collectively loading programs in a multiple program multiple data environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

    Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less

  7. Shuttle/ISS EMU Failure History and the Impact on Advanced EMU Portable Life Support System (PLSS) Design

    NASA Technical Reports Server (NTRS)

    Campbell, Colin

    2015-01-01

    As the Shuttle/ISS EMU Program exceeds 35 years in duration and is still supporting the needs of the International Space Station (ISS), a critical benefit of such a long running program with thorough documentation of system and component failures is the ability to study and learn from those failures when considering the design of the next generation space suit. Study of the subject failure history leads to changes in the Advanced EMU Portable Life Support System (PLSS) schematic, selected component technologies, as well as the planned manner of ground testing. This paper reviews the Shuttle/ISS EMU failure history and discusses the implications to the AEMU PLSS.

  8. Lithium ion rechargeable systems studies

    NASA Astrophysics Data System (ADS)

    Levy, Samuel C.; Lasasse, Robert R.; Cygan, Randall T.; Voigt, James A.

    Lithium ion systems, although relatively new, have attracted much interest worldwide. Their high energy density, long cycle life and relative safety, compared with metallic lithium rechargeable systems, make them prime candidates for powering portable electronic equipment. Although lithium ion cells are presently used in a few consumer devices, e.g., portable phones, camcorders, and laptop computers, there is room for considerable improvement in their performance. Specific areas that need to be addressed include: (1) carbon anode-increase reversible capacity, and minimize passivation; (2) cathode-extend cycle life, improve rate capability, and increase capacity. There are several programs ongoing at Sandia National Laboratories which are investigating means of achieving the stated objectives in these specific areas. This paper will review these programs.

  9. FORTRAN tools

    NASA Technical Reports Server (NTRS)

    Presser, L.

    1978-01-01

    An integrated set of FORTRAN tools that are commercially available is described. The basic purpose of various tools is summarized and their economic impact highlighted. The areas addressed by these tools include: code auditing, error detection, program portability, program instrumentation, documentation, clerical aids, and quality assurance.

  10. PERFORMANCE VERIFICATION TEST FOR FIELD-PORTABLE MEASUREMENTS OF LEAD IN DUST

    EPA Science Inventory

    The US Environmental Protection Agency (EPA) Environmental Technology Verification (ETV) program (www.epa.jzov/etv) conducts performance verification tests of technologies used for the characterization and monitoring of contaminated media. The program exists to provide high-quali...

  11. Portable and programmable clinical EOG diagnostic system.

    PubMed

    Chen, S C; Tsai, T T; Luo, C H

    2000-01-01

    Monitoring eye movements is clinically important in diagnosis of diseases of the central nervous system. Electrooculography (EOG) is one method of obtaining such records which uses skin electrodes, and utilizes the anterior posterior polarization of the eye. A new EOG diagnostic system has been developed that utilizes two off-the-shelf portable notebook computers, one projector and simple electronic hardware. It can be operated under Windows 95, 98, NT, and has significant advantages over any other similar equipment, including programmability, portability, improved safety and low cost. Especially, portability of the instrument is extremely important for acutely ill or handicapped patients. The purpose of this paper is to introduce the techniques of computer animation, data acquisition, real time analysis of measured data, and database management to implement a portable, programmable and inexpensive contacting EOG instrument. It is very convenient to replace the present expensive, inflexible and large-sized commercially available EOG instruments. A lot of interesting stimulation patterns for clinical application can be created easily in different shape, time sequence, and colour by programming in Delphi language. With the help of Winstar (a software package that is used to control I/O and interrupt functions of the computer under Windows 95, 98, NT), the I/O communication between two notebook computers and A/D interface module can be effectively programmed. In addition, the new EOG diagnostic system is battery operated and it has the advantages of low noise as well as isolation from electricity. Two kinds of EOG tests, pursuit and saccade, were performed on 20 normal subjects with this new portable and programmable instrument. Based on the test result, the performance of the new instrument is superior to the other commercially available instruments. In conclusion, we hope that it will be more convenient for doctors and researchers to do the clinical EOG diagnosis and basic medical science research by using this new creation.

  12. MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

    NASA Astrophysics Data System (ADS)

    Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

    2018-02-01

    We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

  13. IOPA: I/O-aware parallelism adaption for parallel programs

    PubMed Central

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236

  14. IOPA: I/O-aware parallelism adaption for parallel programs.

    PubMed

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.

  15. User's guide for Northeast Stand Exam Program (NEST Version 2.1).

    Treesearch

    Thomas M. Schuler; Brian T. Simpson

    1991-01-01

    Explains the Northeast Stand Exam (NEST Version 2.1) program. The NEST program was designed for use on the Polycorder 600 Series electronic portable data recorder to record data collected from the standard permanent plot as described by the Stand Culture and Stand Establishment Working Groups of the Northeastern Forest Experiment Station.

  16. MrGrid: A Portable Grid Based Molecular Replacement Pipeline

    PubMed Central

    Reboul, Cyril F.; Androulakis, Steve G.; Phan, Jennifer M. N.; Whisstock, James C.; Goscinski, Wojtek J.; Abramson, David; Buckle, Ashley M.

    2010-01-01

    Background The crystallographic determination of protein structures can be computationally demanding and for difficult cases can benefit from user-friendly interfaces to high-performance computing resources. Molecular replacement (MR) is a popular protein crystallographic technique that exploits the structural similarity between proteins that share some sequence similarity. But the need to trial permutations of search models, space group symmetries and other parameters makes MR time- and labour-intensive. However, MR calculations are embarrassingly parallel and thus ideally suited to distributed computing. In order to address this problem we have developed MrGrid, web-based software that allows multiple MR calculations to be executed across a grid of networked computers, allowing high-throughput MR. Methodology/Principal Findings MrGrid is a portable web based application written in Java/JSP and Ruby, and taking advantage of Apple Xgrid technology. Designed to interface with a user defined Xgrid resource the package manages the distribution of multiple MR runs to the available nodes on the Xgrid. We evaluated MrGrid using 10 different protein test cases on a network of 13 computers, and achieved an average speed up factor of 5.69. Conclusions MrGrid enables the user to retrieve and manage the results of tens to hundreds of MR calculations quickly and via a single web interface, as well as broadening the range of strategies that can be attempted. This high-throughput approach allows parameter sweeps to be performed in parallel, improving the chances of MR success. PMID:20386612

  17. High Intensity Lights

    NASA Technical Reports Server (NTRS)

    1983-01-01

    Xenon arc lamps developed during the Apollo program by Streamlight, Inc. are the basis for commercial flashlights and emergency handlights. These are some of the brightest portable lights made. They throw a light some 50 times brighter than automobile high beams and are primarily used by police and military. The light penetrates fog and smoke and returns less back-scatter light. They are operated on portable power packs as boat and auto batteries. An infrared model produces totally invisible light for covert surveillance.

  18. Parallel language constructs for tensor product computations on loosely coupled architectures

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Vanrosendale, John

    1989-01-01

    Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.

  19. Digital Remix

    ERIC Educational Resources Information Center

    Roach, Ronald

    2010-01-01

    There's no doubt laptop programs remain important to many institutions, particularly to those that consider social equity an important value in their technology programs. Now, the transition to mobile computing, which puts smaller and often more versatile portable devices in the marketplace, has inspired college and university administrators to…

  20. Methods for design and evaluation of parallel computating systems (The PISCES project)

    NASA Technical Reports Server (NTRS)

    Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

    1989-01-01

    The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.

  1. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less

  2. Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

    NASA Astrophysics Data System (ADS)

    Georgiev, K.; Zlatev, Z.

    2010-11-01

    The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.

  3. Apparatus to collect, classify, concentrate, and characterize gas-borne particles

    DOEpatents

    Rader, Daniel J.; Torczynski, John R.; Wally, Karl; Brockmann, John E.

    2002-01-01

    An aerosol lab-on-a-chip (ALOC) integrates one or more of a variety of aerosol collection, classification, concentration (enrichment), and characterization processes onto a single substrate or layered stack of such substrates. By taking advantage of modern micro-machining capabilities, an entire suite of discrete laboratory aerosol handling and characterization techniques can be combined in a single portable device that can provide a wealth of data on the aerosol being sampled. The ALOC offers parallel characterization techniques and close proximity of the various characterization modules helps ensure that the same aerosol is available to all devices (dramatically reducing sampling and transport errors). Micro-machine fabrication of the ALOC significantly reduces unit costs relative to existing technology, and enables the fabrication of small, portable ALOC devices, as well as the potential for rugged design to allow operation in harsh environments. Miniaturization also offers the potential of working with smaller particle sizes and lower pressure drops (leading to reduction of power consumption).

  4. Configurable analog-digital conversion using the neural engineering framework

    PubMed Central

    Mayr, Christian G.; Partzsch, Johannes; Noack, Marko; Schüffny, Rene

    2014-01-01

    Efficient Analog-Digital Converters (ADC) are one of the mainstays of mixed-signal integrated circuit design. Besides the conventional ADCs used in mainstream ICs, there have been various attempts in the past to utilize neuromorphic networks to accomplish an efficient crossing between analog and digital domains, i.e., to build neurally inspired ADCs. Generally, these have suffered from the same problems as conventional ADCs, that is they require high-precision, handcrafted analog circuits and are thus not technology portable. In this paper, we present an ADC based on the Neural Engineering Framework (NEF). It carries out a large fraction of the overall ADC process in the digital domain, i.e., it is easily portable across technologies. The analog-digital conversion takes full advantage of the high degree of parallelism inherent in neuromorphic networks, making for a very scalable ADC. In addition, it has a number of features not commonly found in conventional ADCs, such as a runtime reconfigurability of the ADC sampling rate, resolution and transfer characteristic. PMID:25100933

  5. Crime scene investigations using portable, non-destructive space exploration technology

    NASA Technical Reports Server (NTRS)

    Trombka, Jacob I.; Schweitzer, Jeffrey; Selavka, Carl; Dale, Mark; Gahn, Norman; Floyd, Samuel; Marie, James; Hobson, Maritza; Zeosky, Jerry; Martin, Ken; hide

    2002-01-01

    The National Institute of Justice (NIJ) and the National Aeronautics and Space Administration's (NASAs) Goddard Space Flight Center (GSFC) have teamed up to explore the use of NASA developed technologies to help criminal justice agencies and professionals solve crimes. The objective of the program is to produce instruments and communication networks that have application within both NASA's space program and NIJ programs with state and local forensic laboratories. A working group of NASA scientists and law enforcement professionals has been established to develop and implement a feasibility demonstration program. Specifically, the group has focused its efforts on identifying gunpowder and primer residue, blood, and semen at crime scenes. Non-destructive elemental composition identification methods are carried out using portable X-ray fluorescence (XRF) systems. These systems are similar to those being developed for planetary exploration programs. A breadboard model of a portable XRF system has been constructed for these tests using room temperature silicon and cadmium-zinc telluride (CZT) detectors. Preliminary tests have been completed with gunshot residue (GSR), blood-spatter and semen samples. Many of the element composition lines have been identified. Studies to determine the minimum detectable limits needed for the analyses of GSR, blood and semen in the crime scene environment have been initiated and preliminary results obtained. Furthermore, a database made up of the inorganic composition of GSR is being developed. Using data obtained from the open literature of the elemental composition of barium (Ba) and antimony (Sb) in handswipes of GSR, we believe that there may be a unique GSR signature based on the Sb to Ba ratio.

  6. Computer-aided programming for message-passing system; Problems and a solution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, M.Y.; Gajski, D.D.

    1989-12-01

    As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dickerson, James; Camino, Fernando; Irwin, Edward

    Brookhaven Lab and a local school district collaborated to develop a nanotechnology program that brings students “into” labs at Brookhaven’s Center for Functional Nanomaterials through a portable videoconferencing system.

  8. Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

    NASA Technical Reports Server (NTRS)

    Abdi, Frank

    1996-01-01

    A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.

  9. Parallel implementation of an adaptive and parameter-free N-body integrator

    NASA Astrophysics Data System (ADS)

    Pruett, C. David; Ingham, William H.; Herman, Ralph D.

    2011-05-01

    Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.

  10. Small Portable PEM Fuel Cell Systems for NASA Exploration Missions

    NASA Technical Reports Server (NTRS)

    Burke, Kenneth A.

    2005-01-01

    Oxygen-Hydrogen PEM-based fuel cell systems are being examined as a portable power source alternative in addition to advanced battery technology. Fuel cell power systems have been used by the Gemini, Apollo, and Space Shuttle programs. These systems have not been portable, but have been integral parts of their spacecraft, and have used reactants from a separate cryogenic supply. These systems typically have been higher in power. They also have had significant ancillary equipment sections that perform the pumping of reactants and coolant through the fuel cell stack and the separation of the product water from the unused reactant streams. The design of small portable fuel cell systems will be a significant departure from these previous designs. These smaller designs will have very limited ancillary equipment, relying on passive techniques for reactant and thermal management, and the reactant storage will be an integral part of the fuel cell system. An analysis of the mass and volume for small portable fuel cell systems was done to evaluate and quantify areas of technological improvement. A review of current fuel cell technology as well as reactant storage and management technology was completed to validate the analysis and to identify technology challenges

  11. Parallel solution of sparse one-dimensional dynamic programming problems

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1989-01-01

    Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.

  12. 76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-26

    ... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... technologies to participate in a program of parallel FDA-CMS review. The document was published with an...

  13. From MODFLOW-96 to MODFLOW-2005, ParFlow and Others: Updates and a Workflow for Up- and Out- Conversion

    NASA Astrophysics Data System (ADS)

    Pierce, S. A.; Hardesty Lewis, D.

    2017-12-01

    MODFLOW (MF) has served for decades as a de facto standard for groundwater modelling. Despite successive versions, legacy MF-96 simulations are still commonly encountered cases. Such is the case for many of the groundwater availability models of the State of Texas. Unfortunately, even the existence of converters to MF's newer versions has not necessarily stimulated their adoption, let alone re-creation of legacy models. This state of affairs may be due to the unfamiliarity of the modeller with the terminal or the FORTRAN programming language, resulting in an inability to address the minor or major bugs, nuances, or limitations in compilation or execution of the conversion programs. Here, we present a workflow that addresses the above intricacies all the while attempting to maintain portability in implementation. This workflow is contructed in the form of a Bash script and - with the geoscience-oriented in mind - re-presented as a Jupyter notebook. First, one may choose whether this executable will run with POSIX-compliance or with a preference towards the Bash facilities, both widely adopted by operating systems. In the same vein, it attempts to function within minimal command environments, which reduces any dependencies. Finally, it is designed to offer parallelism across as many cores and nodes as necessary or as few as desired, whether upon a personal or super-computer. Underlying this workflow are patches such that antiquated tools may compile and execute upon modern hardware. Also, fixes to long-standing bugs and limitations in the existing MF converters have been prepared. Specifically, support for the conversion of -96- and Horizontal Flow Barrier-coupled simulations has been added. More radically, we have laid the foundations of a conversion utility between MF and a similar modeller, ParFlow. Furthermore, the modular approach followed may extend to an application which inter-operates between arbitrary groundwater simulators. In short, an accessible and portable workflow of the process of up-conversion between MODFLOW versions now avails itself to geoscientists. Updated programs within it may allow for re-use, in whole or in part, legacy simulations. Lastly, a generic inter-operator has been established, invoking the possibility of significant ease in the recycling of groundwater data in the future.

  14. Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

    PubMed

    Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

    2014-07-05

    A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.

  15. ENVIRONMENTAL TECHNOLOGY VERIFICATION REPORT - PORTABLE GAS CHROMATOGRAPH ELECTRONIC SENSOR TECHNOLOGY MODEL 4100

    EPA Science Inventory

    The U.S. Environmental Protection Agency, through the Environmental Technology Verification Program, is working to accelerate the acceptance and use of innovative technologies that improve the way the United States manages its environmental problems. As part of this program, the...

  16. ETHOS: Education Through Organ Study. Final Report.

    ERIC Educational Resources Information Center

    Wohl, Seth F.

    This document describes the Education Through Organ Study Program (ETHOS) designed to counteract disaffected students' falling academic achievement, absenteeism and truancy, and negative attitudes toward education and the school. The program attempted to assure success in acquiring musical skill by providing portable desk reed organs, programmed…

  17. Generating performance portable geoscientific simulation code with Firedrake (Invited)

    NASA Astrophysics Data System (ADS)

    Ham, D. A.; Bercea, G.; Cotter, C. J.; Kelly, P. H.; Loriant, N.; Luporini, F.; McRae, A. T.; Mitchell, L.; Rathgeber, F.

    2013-12-01

    This presentation will demonstrate how a change in simulation programming paradigm can be exploited to deliver sophisticated simulation capability which is far easier to programme than are conventional models, is capable of exploiting different emerging parallel hardware, and is tailored to the specific needs of geoscientific simulation. Geoscientific simulation represents a grand challenge computational task: many of the largest computers in the world are tasked with this field, and the requirements of resolution and complexity of scientists in this field are far from being sated. However, single thread performance has stalled, even sometimes decreased, over the last decade, and has been replaced by ever more parallel systems: both as conventional multicore CPUs and in the emerging world of accelerators. At the same time, the needs of scientists to couple ever-more complex dynamics and parametrisations into their models makes the model development task vastly more complex. The conventional approach of writing code in low level languages such as Fortran or C/C++ and then hand-coding parallelism for different platforms by adding library calls and directives forces the intermingling of the numerical code with its implementation. This results in an almost impossible set of skill requirements for developers, who must simultaneously be domain science experts, numericists, software engineers and parallelisation specialists. Even more critically, it requires code to be essentially rewritten for each emerging hardware platform. Since new platforms are emerging constantly, and since code owners do not usually control the procurement of the supercomputers on which they must run, this represents an unsustainable development load. The Firedrake system, conversely, offers the developer the opportunity to write PDE discretisations in the high-level mathematical language UFL from the FEniCS project (http://fenicsproject.org). Non-PDE model components, such as parametrisations, can be written as short C kernels operating locally on the underlying mesh, with no explicit parallelism. The executable code is then generated in C, CUDA or OpenCL and executed in parallel on the target architecture. The system also offers features of special relevance to the geosciences. In particular, the large scale separation between the vertical and horizontal directions in many geoscientific processes can be exploited to offer the flexibility of unstructured meshes in the horizontal direction, without the performance penalty usually associated with those methods.

  18. Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    NASA Technical Reports Server (NTRS)

    Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

    1994-01-01

    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.

  19. Using CLIPS in the domain of knowledge-based massively parallel programming

    NASA Technical Reports Server (NTRS)

    Dvorak, Jiri J.

    1994-01-01

    The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.

  20. Q14 - Standards Development Plan, Ada Interfaces to X Window System, Analysis and Recommendations

    DTIC Science & Technology

    1989-03-20

    portability and reusability. -, . /- , ... ’ 4t. I-,: 2 Introduction Two major thrusts of the STARS program, and industry as a whole, are application...and IEEE, and in industry consortiums to show the directions X is taking and the opportunities for Ada to utilize this work. X is not the only window...and actually prohibit portability, but to avoid this the X developers formed the X Consortium, consisting of industry and academic members, who define

  1. Evolving binary classifiers through parallel computation of multiple fitness cases.

    PubMed

    Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

    2005-06-01

    This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.

  2. Implementations of BLAST for parallel computers.

    PubMed

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  3. Programming parallel architectures: The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  4. Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

    Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less

  5. High-performance computing — an overview

    NASA Astrophysics Data System (ADS)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  6. The WorkPlace distributed processing environment

    NASA Technical Reports Server (NTRS)

    Ames, Troy; Henderson, Scott

    1993-01-01

    Real time control problems require robust, high performance solutions. Distributed computing can offer high performance through parallelism and robustness through redundancy. Unfortunately, implementing distributed systems with these characteristics places a significant burden on the applications programmers. Goddard Code 522 has developed WorkPlace to alleviate this burden. WorkPlace is a small, portable, embeddable network interface which automates message routing, failure detection, and re-configuration in response to failures in distributed systems. This paper describes the design and use of WorkPlace, and its application in the construction of a distributed blackboard system.

  7. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  8. From Petascale to Exascale: Eight Focus Areas of R&D Challenges for HPC Simulation Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Springmeyer, R; Still, C; Schulz, M

    2011-03-17

    Programming models bridge the gap between the underlying hardware architecture and the supporting layers of software available to applications. Programming models are different from both programming languages and application programming interfaces (APIs). Specifically, a programming model is an abstraction of the underlying computer system that allows for the expression of both algorithms and data structures. In comparison, languages and APIs provide implementations of these abstractions and allow the algorithms and data structures to be put into practice - a programming model exists independently of the choice of both the programming language and the supporting APIs. Programming models are typically focusedmore » on achieving increased developer productivity, performance, and portability to other system designs. The rapidly changing nature of processor architectures and the complexity of designing an exascale platform provide significant challenges for these goals. Several other factors are likely to impact the design of future programming models. In particular, the representation and management of increasing levels of parallelism, concurrency and memory hierarchies, combined with the ability to maintain a progressive level of interoperability with today's applications are of significant concern. Overall the design of a programming model is inherently tied not only to the underlying hardware architecture, but also to the requirements of applications and libraries including data analysis, visualization, and uncertainty quantification. Furthermore, the successful implementation of a programming model is dependent on exposed features of the runtime software layers and features of the operating system. Successful use of a programming model also requires effective presentation to the software developer within the context of traditional and new software development tools. Consideration must also be given to the impact of programming models on both languages and the associated compiler infrastructure. Exascale programming models must reflect several, often competing, design goals. These design goals include desirable features such as abstraction and separation of concerns. However, some aspects are unique to large-scale computing. For example, interoperability and composability with existing implementations will prove critical. In particular, performance is the essential underlying goal for large-scale systems. A key evaluation metric for exascale models will be the extent to which they support these goals rather than merely enable them.« less

  9. The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation

    NASA Astrophysics Data System (ADS)

    Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru

    1999-09-01

    We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.

  10. Hybrid Parallel Contour Trees, Version 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sewell, Christopher; Fasel, Patricia; Carr, Hamish

    A common operation in scientific visualization is to compute and render a contour of a data set. Given a function of the form f : R^d -> R, a level set is defined as an inverse image f^-1(h) for an isovalue h, and a contour is a single connected component of a level set. The Reeb graph can then be defined to be the result of contracting each contour to a single point, and is well defined for Euclidean spaces or for general manifolds. For simple domains, the graph is guaranteed to be a tree, and is called the contourmore » tree. Analysis can then be performed on the contour tree in order to identify isovalues of particular interest, based on various metrics, and render the corresponding contours, without having to know such isovalues a priori. This code is intended to be the first data-parallel algorithm for computing contour trees. Our implementation will use the portable data-parallel primitives provided by Nvidia’s Thrust library, allowing us to compile our same code for both GPUs and multi-core CPUs. Native OpenMP and purely serial versions of the code will likely also be included. It will also be extended to provide a hybrid data-parallel / distributed algorithm, allowing scaling beyond a single GPU or CPU.« less

  11. Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

    NASA Technical Reports Server (NTRS)

    Gelenbe, Erol

    1988-01-01

    An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.

  12. Real time software tools and methodologies

    NASA Technical Reports Server (NTRS)

    Christofferson, M. J.

    1981-01-01

    Real time systems are characterized by high speed processing and throughput as well as asynchronous event processing requirements. These requirements give rise to particular implementations of parallel or pipeline multitasking structures, of intertask or interprocess communications mechanisms, and finally of message (buffer) routing or switching mechanisms. These mechanisms or structures, along with the data structue, describe the essential character of the system. These common structural elements and mechanisms are identified, their implementation in the form of routines, tasks or macros - in other words, tools are formalized. The tools developed support or make available the following: reentrant task creation, generalized message routing techniques, generalized task structures/task families, standardized intertask communications mechanisms, and pipeline and parallel processing architectures in a multitasking environment. Tools development raise some interesting prospects in the areas of software instrumentation and software portability. These issues are discussed following the description of the tools themselves.

  13. Experiences with hypercube operating system instrumentation

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Rudolph, David C.

    1989-01-01

    The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.

  14. Communications oriented programming of parallel iterative solutions of sparse linear systems

    NASA Technical Reports Server (NTRS)

    Patrick, M. L.; Pratt, T. W.

    1986-01-01

    Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.

  15. Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

    PubMed

    Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

    2013-10-24

    Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (<100 ms) moved to the center only one saccade later. Our data on eye movement positions provide novel evidence of parallel programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.

  16. PyPele Rewritten To Use MPI

    NASA Technical Reports Server (NTRS)

    Hockney, George; Lee, Seungwon

    2008-01-01

    A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.

  17. USL/DBMS NASA/PC R and D project C programming standards

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Moreau, Dennis R.

    1984-01-01

    A set of programming standards intended to promote reliability, readability, and portability of C programs written for PC research and development projects is established. These standards must be adhered to except where reasons for deviation are clearly identified and approved by the PC team. Any approved deviation from these standards must also be clearly documented in the pertinent source code.

  18. A survey of parallel programming tools

    NASA Technical Reports Server (NTRS)

    Cheng, Doreen Y.

    1991-01-01

    This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.

  19. Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.

  20. UPEML Version 3.0: A machine-portable CDC update emulator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mehlhorn, T.A.; Haill, T.A.

    1992-04-01

    UPEML is a machine-portable program that emulates a subset of the functions of the standard CDC Update. Machine-portability has been achieved by conforming to ANSI standards for Fortran-77. UPEML is compact and fairly efficient; however, it only allows a restricted syntax as compared with the CDC Update. This program was written primarily to facilitate the use of CDC-based scientific packages on alternate computer systems such as the VAX/VMS mainframes and UNIX workstations. UPEML has also been successfully used on the multiprocessor ELXSI, on CRAYs under both UNICOS and CTSS operating systems, and on Sun, HP, Stardent and IBM workstations. UPEMLmore » was originally released with the ITS electron/photon Monte Carlo transport package, which was developed on a CDC-7600 and makes extensive use of conditional file structure to combine several problem geometry and machine options into a single program file. UPEML 3.0 is an enhanced version of the original code and is being independently released for use at any installation or with any code package. Version 3.0 includes enhanced error checking, full ASCII character support, a program library audit capability, and a partial update option in which only selected or modified decks are written to the complete file. Version 3.0 also checks for overlapping corrections, allows processing of pested calls to common decks, and allows the use of alternate files in READ and ADDFILE commands. Finally, UPEML Version 3.0 allows the assignment of input and output files at runtime on the control line.« less

  1. UPEML Version 3. 0: A machine-portable CDC update emulator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mehlhorn, T.A.; Haill, T.A.

    1992-04-01

    UPEML is a machine-portable program that emulates a subset of the functions of the standard CDC Update. Machine-portability has been achieved by conforming to ANSI standards for Fortran-77. UPEML is compact and fairly efficient; however, it only allows a restricted syntax as compared with the CDC Update. This program was written primarily to facilitate the use of CDC-based scientific packages on alternate computer systems such as the VAX/VMS mainframes and UNIX workstations. UPEML has also been successfully used on the multiprocessor ELXSI, on CRAYs under both UNICOS and CTSS operating systems, and on Sun, HP, Stardent and IBM workstations. UPEMLmore » was originally released with the ITS electron/photon Monte Carlo transport package, which was developed on a CDC-7600 and makes extensive use of conditional file structure to combine several problem geometry and machine options into a single program file. UPEML 3.0 is an enhanced version of the original code and is being independently released for use at any installation or with any code package. Version 3.0 includes enhanced error checking, full ASCII character support, a program library audit capability, and a partial update option in which only selected or modified decks are written to the complete file. Version 3.0 also checks for overlapping corrections, allows processing of pested calls to common decks, and allows the use of alternate files in READ and ADDFILE commands. Finally, UPEML Version 3.0 allows the assignment of input and output files at runtime on the control line.« less

  2. Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.

    PubMed

    Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K

    2016-09-01

    Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.

  3. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  4. An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

    2015-01-01

    This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.

  5. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.

  6. Automated Portable Test System (APTS) - A performance envelope assessment tool

    NASA Technical Reports Server (NTRS)

    Kennedy, R. S.; Dunlap, W. P.; Jones, M. B.; Wilkes, R. L.; Bittner, A. C., Jr.

    1985-01-01

    The reliability and stability of microcomputer-based psychological tests are evaluated. The hardware, test programs, and system control of the Automated Portable Test System, which assesses human performance and subjective status, are described. Subjects were administered 11 pen-and-pencil and microcomputer-based tests for 10 sessions. The data reveal that nine of the 10 tests stabilized by the third administration; inertial correlations were high and consistent. It is noted that the microcomputer-based tests display good psychometric properties in terms of differential stability and reliability.

  7. Introduction to the Portable Life Support Schematic and Technology Development Components

    NASA Technical Reports Server (NTRS)

    Conger, Bruce

    2008-01-01

    Conger presented the operations and functions of the baseline Constellation Program (CxP) Portable Life Support System (PLSS) schematic and key development technologies. He explained the functional descriptions of the schematic components in the fluid systems of the PLSS for multiple operational scenarios. PLSS subsystems include the oxygen subsystem, the ventilation subsystem, and the thermal subsystem. He also presented the operational PLSS modes: Nominal EVA mode, Umbilical - no recharge mode, Umbilical - with recharge mode, BENDS mode, BUDDY mode, Secondary oxygen mode, and the PLSS-removed umbilical mode.

  8. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.

    PubMed

    Stone, John E; Gohara, David; Shi, Guochun

    2010-05-01

    We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.

  9. Parallel computing techniques for rotorcraft aerodynamics

    NASA Astrophysics Data System (ADS)

    Ekici, Kivanc

    The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).

  10. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.

  11. Rubus: A compiler for seamless and extensible parallelism.

    PubMed

    Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.

  12. Rubus: A compiler for seamless and extensible parallelism

    PubMed Central

    Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758

  13. Microprocessor controlled portable TLD system

    NASA Technical Reports Server (NTRS)

    Apathy, I.; Deme, S.; Feher, I.

    1996-01-01

    An up-to-date microprocessor controlled thermoluminescence dosemeter (TLD) system for environmental and space dose measurements has been developed. The earlier version of the portable TLD system, Pille, was successfully used on Soviet orbital stations as well as on the US Space Shuttle, and for environmental monitoring. The new portable TLD system, Pille'95, consists of a reader and TL bulb dosemeters, and each dosemeter is provided with an EEPROM chip for automatic identification. The glow curve data are digitised and analysed by the program of the reader. The measured data and the identification number appear on the LED display of the reader. Up to several thousand measured data together with the glow curves can be stored on a removable flash memory card. The whole system is supplied either from built-in rechargeable batteries or from the mains of the space station.

  14. Efficient partitioning and assignment on programs for multiprocessor execution

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1993-01-01

    The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.

  15. 2014 Runtime Systems Summit. Runtime Systems Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sarkar, Vivek; Budimlic, Zoran; Kulkani, Milind

    2016-09-19

    This report summarizes runtime system challenges for exascale computing, that follow from the fundamental challenges for exascale systems that have been well studied in past reports, e.g., [6, 33, 34, 32, 24]. Some of the key exascale challenges that pertain to runtime systems include parallelism, energy efficiency, memory hierarchies, data movement, heterogeneous processors and memories, resilience, performance variability, dynamic resource allocation, performance portability, and interoperability with legacy code. In addition to summarizing these challenges, the report also outlines different approaches to addressing these significant challenges that have been pursued by research projects in the DOE-sponsored X-Stack and OS/R programs. Sincemore » there is often confusion as to what exactly the term “runtime system” refers to in the software stack, we include a section on taxonomy to clarify the terminology used by participants in these research projects. In addition, we include a section on deployment opportunities for vendors and government labs to build on the research results from these projects. Finally, this report is also intended to provide a framework for discussing future research and development investments for exascale runtime systems, and for clarifying the role of runtime systems in exascale software.« less

  16. Microfluidic Lab-on-a-Chip Platforms: Requirements, Characteristics and Applications

    NASA Astrophysics Data System (ADS)

    Mark, D.; Haeberle, S.; Roth, G.; Von Stetten, F.; Zengerle, R.

    This review summarizes recent developments in microfluidic platform approaches. In contrast to isolated application-specific solutions, a microfluidic platform provides a set of fluidic unit operations, which are designed for easy combination within a well-defined fabrication technology. This allows the implementation of different application-specific (bio-) chemical processes, automated by microfluidic process integration [1]. A brief introduction into technical advances, major market segments and promising applications is followed by a detailed characterization of different microfluidic platforms, comprising a short definition, the functional principle, microfluidic unit operations, application examples as well as strengths and limitations. The microfluidic platforms in focus are lateral flow tests, linear actuated devices, pressure driven laminar flow, microfluidic large scale integration, segmented flow microfluidics, centrifugal microfluidics, electro-kinetics, electrowetting, surface acoustic waves, and systems for massively parallel analysis. The review concludes with the attempt to provide a selection scheme for microfluidic platforms which is based on their characteristics according to key requirements of different applications and market segments. Applied selection criteria comprise portability, costs of instrument and disposable, sample throughput, number of parameters per sample, reagent consumption, precision, diversity of microfluidic unit operations and the flexibility in programming different liquid handling protocols.

  17. A mechanism for efficient debugging of parallel programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, B.P.; Choi, J.D.

    1988-01-01

    This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less

  18. Assured Access/Mobile Computing Initiatives on Five University Campuses.

    ERIC Educational Resources Information Center

    Blurton, Craig; Chee, Yam San; Long, Phillip D.; Resmer, Mark; Runde, Craig

    Mobile computing and assured access are becoming popular terms to describe a growing number of university programs which take advantage of ubiquitous network access points and the portability of notebook computers to ensure all students have access to digital tools and resources. However, the implementation of such programs varies widely from…

  19. State Student Incentive Grant Program: Issues in Partnership.

    ERIC Educational Resources Information Center

    Lee, John; And Others

    Some of the issues concerning the evolving relationship between state and federal agencies in the field of student financial aid are examined, with attention to the State Student Incentive Grant Program (SSIG). After tracing the history of the SSIG, the following issues are considered: SSIG portability; state control of fraud, abuse, and error;…

  20. Commentary on A Successful Microfiche Program

    ERIC Educational Resources Information Center

    Campbell, Bill W.

    1978-01-01

    A chart is given showing external document requests filled by hard copy versus microfiche from January 1976 to January 1977. The main factor contributing to the success of the microfiche program of this aircraft company's library has been getting portable readers into the hands of those engineers most open to the use of film. (Author/JPF)

  1. Applications of artificial intelligence to mission planning

    NASA Technical Reports Server (NTRS)

    Ford, Donnie R.; Floyd, Stephen A.; Rogers, John S.

    1990-01-01

    The following subject areas are covered: object-oriented programming task; rule-based programming task; algorithms for resource allocation; connecting a Symbolics to a VAX; FORTRAN from Lisp; trees and forest task; software data structure conversion; software functionality modifications and enhancements; portability of resource allocation to a TI MicroExplorer; frontier of feasibility software system; and conclusions.

  2. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

    PubMed Central

    Stone, John E.; Gohara, David; Shi, Guochun

    2010-01-01

    We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981

  3. The Monterey Ocean Observing System Development Program

    NASA Astrophysics Data System (ADS)

    Chaffey, M.; Graybeal, J. B.; O'Reilly, T.; Ryan, J.

    2004-12-01

    The Monterey Bay Aquarium Research Institute (MBARI) has a major development program underway to design, build, test and apply technology suitable to deep ocean observatories. The Monterey Ocean Observing System (MOOS) program is designed to form a large-scale instrument network that provides generic interfaces, intelligent instrument support, data archiving and near-real-time interaction for observatory experiments. The MOOS mooring system is designed as a portable surface mooring based seafloor observatory that provides data and power connections to both seafloor and ocean surface instruments through a specialty anchor cable. The surface mooring collects solar and wind energy for powering instruments and transmits data to shore-side researchers using a satellite communications modem. The use of a high modulus anchor cable to reach seafloor instrument networks is a high-risk development effort that is critical for the overall success of the portable observatory concept. An aggressive field test program off the California coast is underway to improve anchor cable constructions as well as end-to-end test overall system design. The overall MOOS observatory systems view is presented and the results of our field tests completed to date are summarized.

  4. Genetic algorithms using SISAL parallel programming language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tejada, S.

    1994-05-06

    Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.

  5. An Expert System for the Development of Efficient Parallel Code

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  6. Solving Integer Programs from Dependence and Synchronization Problems

    DTIC Science & Technology

    1993-03-01

    DEFF.NSNE Solving Integer Programs from Dependence and Synchronization Problems Jaspal Subhlok March 1993 CMU-CS-93-130 School of Computer ScienceT IC...method Is an exact and efficient way of solving integer programming problems arising in dependence and synchronization analysis of parallel programs...7/;- p Keywords: Exact dependence tesing, integer programming. parallelilzng compilers, parallel program analysis, synchronization analysis Solving

  7. Programming Tools: Status, Evaluation, and Comparison

    NASA Technical Reports Server (NTRS)

    Cheng, Doreen Y.; Cooper, D. M. (Technical Monitor)

    1994-01-01

    In this tutorial I will first describe the characteristics of scientific applications and their developers, and describe the computing environment in a typical high-performance computing center. I will define the user requirements for tools that support application portability and present the difficulties to satisfy them. These form the basis of the evaluation and comparison of the tools. I will then describe the tools available in the market and the tools available in the public domain. Specifically, I will describe the tools for converting sequential programs, tools for developing portable new programs, tools for debugging and performance tuning, tools for partitioning and mapping, and tools for managing network of resources. I will introduce the main goals and approaches of the tools, and show main features of a few tools in each category. Meanwhile, I will compare tool usability for real-world application development and compare their different technological approaches. Finally, I will indicate the future directions of the tools in each category.

  8. An easy-to-operate portable pulse-height analysis system for area monitoring with TEPC in radiation protection

    NASA Astrophysics Data System (ADS)

    Kunz, A.; Pihet, P.; Arend, E.; Menzel, H. G.

    1990-12-01

    A portable area monitor for the measurement of dose-equivalent quantities in practical radiation-protection work has been developed. The detector applied is a low-pressure proportional counter (TEPC) used in microdosimetry. The complex analysis system required has been optimized with regard to low power consumption and small size to achieve a real operational survey meter. The newly designed electronic includes complete analog, digital and microprocessor boards. It presents the characteristic of fast pulse-height processing over a large (5 decades) dynamic range. Three original circuits have been specifically developed, consisting of: (1) a miniaturized adjustable high-voltage power supply with low ripple and high stability; (2) a double spectroscopy amplifier with constant gain ratio and common pole-zero stage; and (3) an analog-to-digital converter with quasi-logarithmic characteristics based on a flash converter using fast comparators associated in parallel. With the incorporated single-board computer, the maximal total power consumption is 5 W, enabling 40 hours operation time with batteries. With minor adaptations the equipment is proposed as a low-cost solution for various measuring problems in environmental studies.

  9. Design and implementation of improved LsCpLp resonant circuit for power supply for high-power electromagnetic acoustic transducer excitation

    NASA Astrophysics Data System (ADS)

    Zao, Yongming; Ouyang, Qi; Chen, Jiawei; Zhang, Xinglan; Hou, Shuaicheng

    2017-08-01

    This paper investigates the design and implementation of an improved series-parallel inductor-capacitor-inductor (LsCpLp) resonant circuit power supply for excitation of electromagnetic acoustic transducers (EMATs). The main advantage of the proposed resonant circuit is the absence of a high-permeability dynamic transformer. A high-frequency pulsating voltage gain can be achieved through a double resonance phenomenon. Both resonant tailing behavior and higher harmonics are suppressed by the improved resonant circuit, which also contributes to the generation of ultrasonic waves. Additionally, the proposed circuit can realize impedance matching and can also optimize the transduction efficiency. The complete design and implementation procedure for the power supply is described and has been validated by implementation of the proposed power supply to drive a portable EMAT. The circuit simulation results show close agreement with the experimental results and thus confirm the validity of the proposed topology. The proposed circuit is suitable for use as a portable EMAT excitation power supply that is fed by a low-voltage source.

  10. Freestanding mesoporous VN/CNT hybrid electrodes for flexible all-solid-state supercapacitors.

    PubMed

    Xiao, Xu; Peng, Xiang; Jin, Huanyu; Li, Tianqi; Zhang, Chengcheng; Gao, Biao; Hu, Bin; Huo, Kaifu; Zhou, Jun

    2013-09-25

    High-performance all-solid-state supercapacitors (SCs) are fabricated based on thin, lightweight, and flexible freestanding MVNN/CNT hybrid electrodes. The device shows a high volume capacitance of 7.9 F/cm(3) , volume energy and power density of 0.54 mWh/cm(3) and 0.4 W/cm(3) at a current density of 0.025 A/cm(3) . By being highly flexible, environmentally friendly, and easily connectable in series and parallel, the all-solid-state SCs promise potential applications in portable/wearable electronics. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Using collective variables to drive molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Fiorin, Giacomo; Klein, Michael L.; Hénin, Jérôme

    2013-12-01

    A software framework is introduced that facilitates the application of biasing algorithms to collective variables of the type commonly employed to drive massively parallel molecular dynamics (MD) simulations. The modular framework that is presented enables one to combine existing collective variables into new ones, and combine any chosen collective variable with available biasing methods. The latter include the classic time-dependent biases referred to as steered MD and targeted MD, the temperature-accelerated MD algorithm, as well as the adaptive free-energy biases called metadynamics and adaptive biasing force. The present modular software is extensible, and portable between commonly used MD simulation engines.

  12. Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD

    NASA Technical Reports Server (NTRS)

    Gropp, W. D.; Keyes, D. E.; McInnes, L. C.; Tidriri, M. D.

    1998-01-01

    Implicit solution methods are important in applications modeled by PDEs with disparate temporal and spatial scales. Because such applications require high resolution with reasonable turnaround, "routine" parallelization is essential. The pseudo-transient matrix-free Newton-Krylov-Schwarz (Psi-NKS) algorithmic framework is presented as an answer. We show that, for the classical problem of three-dimensional transonic Euler flow about an M6 wing, Psi-NKS can simultaneously deliver: globalized, asymptotically rapid convergence through adaptive pseudo- transient continuation and Newton's method-, reasonable parallelizability for an implicit method through deferred synchronization and favorable communication-to-computation scaling in the Krylov linear solver; and high per- processor performance through attention to distributed memory and cache locality, especially through the Schwarz preconditioner. Two discouraging features of Psi-NKS methods are their sensitivity to the coding of the underlying PDE discretization and the large number of parameters that must be selected to govern convergence. We therefore distill several recommendations from our experience and from our reading of the literature on various algorithmic components of Psi-NKS, and we describe a freely available, MPI-based portable parallel software implementation of the solver employed here.

  13. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    DOE PAGES

    Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

    2013-01-01

    Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

  14. Distributed and parallel Ada and the Ada 9X recommendations

    NASA Technical Reports Server (NTRS)

    Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.

    1992-01-01

    Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.

  15. Methodology for cost analysis of film-based and filmless portable chest systems

    NASA Astrophysics Data System (ADS)

    Melson, David L.; Gauvain, Karen M.; Beardslee, Brian M.; Kraitsik, Michael J.; Burton, Larry; Blaine, G. James; Brink, Gary S.

    1996-05-01

    Many studies analyzing the costs of film-based and filmless radiology have focused on multi- modality, hospital-wide solutions. Yet due to the enormous cost of converting an entire large radiology department or hospital to a filmless environment all at once, institutions often choose to eliminate film one area at a time. Narrowing the focus of cost-analysis may be useful in making such decisions. This presentation will outline a methodology for analyzing the cost per exam of film-based and filmless solutions for providing portable chest exams to Intensive Care Units (ICUs). The methodology, unlike most in the literature, is based on parallel data collection from existing filmless and film-based ICUs, and is currently being utilized at our institution. Direct costs, taken from the perspective of the hospital, for portable computed radiography chest exams in one filmless and two film-based ICUs are identified. The major cost components are labor, equipment, materials, and storage. Methods for gathering and analyzing each of the cost components are discussed, including FTE-based and time-based labor analysis, incorporation of equipment depreciation, lease, and maintenance costs, and estimation of materials costs. Extrapolation of data from three ICUs to model hypothetical, hospital-wide film-based and filmless ICU imaging systems is described. Performance of sensitivity analysis on the filmless model to assess the impact of anticipated reductions in specific labor, equipment, and archiving costs is detailed. A number of indirect costs, which are not explicitly included in the analysis, are identified and discussed.

  16. Implementation and performance of parallel Prolog interpreter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, S.; Kale, L.V.; Balkrishna, R.

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  17. The UNIX Operating System: A Model for Software Design.

    ERIC Educational Resources Information Center

    Kernighan, Brian W.; Morgan, Samuel P.

    1982-01-01

    Describes UNIX time-sharing operating system, including the program environment, software development tools, flexibility and ease of change, portability and other advantages, and five applications and three nonapplications of the system. (JN)

  18. Development and flight test of a helicopter compact, portable, precision landing system concept

    NASA Technical Reports Server (NTRS)

    Clary, G. R.; Bull, J. S.; Davis, T. J.; Chisholm, J. P.

    1984-01-01

    An airborne, radar-based, precision approach concept is being developed and flight tested as a part of NASA's Rotorcraft All-Weather Operations Research Program. A transponder-based beacon landing system (BLS) applying state-of-the-art X-band radar technology and digital processing techniques, was built and is being flight tested to demonstrate the concept feasibility. The BLS airborne hardware consists of an add-on microprocessor, installed in conjunction with the aircraft weather/mapping radar, which analyzes the radar beacon receiver returns and determines range, localizer deviation, and glide-slope deviation. The ground station is an inexpensive, portable unit which can be quickly deployed at a landing site. Results from the flight test program show that the BLS concept has a significant potential for providing rotorcraft with low-cost, precision instrument approach capability in remote areas.

  19. GeoBuilder: a geometric algorithm visualization and debugging system for 2D and 3D geometric computing.

    PubMed

    Wei, Jyh-Da; Tsai, Ming-Hung; Lee, Gen-Cher; Huang, Jeng-Hung; Lee, Der-Tsai

    2009-01-01

    Algorithm visualization is a unique research topic that integrates engineering skills such as computer graphics, system programming, database management, computer networks, etc., to facilitate algorithmic researchers in testing their ideas, demonstrating new findings, and teaching algorithm design in the classroom. Within the broad applications of algorithm visualization, there still remain performance issues that deserve further research, e.g., system portability, collaboration capability, and animation effect in 3D environments. Using modern technologies of Java programming, we develop an algorithm visualization and debugging system, dubbed GeoBuilder, for geometric computing. The GeoBuilder system features Java's promising portability, engagement of collaboration in algorithm development, and automatic camera positioning for tracking 3D geometric objects. In this paper, we describe the design of the GeoBuilder system and demonstrate its applications.

  20. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele

    2001-01-01

    This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.

  1. Quality consciousness...auditing for HIPAA Privacy Compliance.

    PubMed

    LePar, Kathleen

    2004-01-01

    The Health Insurance Portability and Accountability Act (HIPAA) privacy deadline has passed. Now it is essential to comply with the regulations. The stakes are high; therefore, a HIPAA Privacy Compliance Program must be part of an organization's quality initiatives. This article provides guidelines for the challenges of continual program improvement, successful cultural change, and effective monitoring of the existing program. Healthcare organizations will attain compliance goals through internal audits on the processes, policies, and training efforts of their HIPAA program.

  2. Carbon monoxide poisoning from portable electric generators.

    PubMed

    Hampson, Neil B; Zmaeff, Jennette L

    2005-01-01

    While the overall death rate from unintentional carbon monoxide (CO) poisoning has decreased in the United States due to improved automobile emissions controls and a decline in CO poisonings from motor vehicles, exposures have not changed from some sources of CO. One of these is the operation of portable electrical generators in poorly ventilated spaces. This study sought to describe the population poisoned from CO produced by portable electric generators, and to determine the reasons that generators are operated in a hazardous fashion. Cases of CO poisoning referred for treatment with hyperbaric oxygen at Virginia Mason Medical Center in Seattle from November 1978 to March 2004 were reviewed. Those cases that resulted from portable generator use were selected for analysis. Sixty-three patients aged 2 to 85 years were treated for CO poisoning from portable electric generators. They included 34 males and 29 females who were poisoned in 37 separate incidents. Thirty-four lost consciousness with the exposure. Of the 63 total patients, 60 spoke English. Generators were typically used when normal electrical service was disrupted by a storm or in remote locations. In 29 of 37 incidents, the generator was operated in the home environment, most commonly in the garage. Lack of awareness of the dangers of CO poisoning or lack of knowledge of ventilation requirements were the most commonly identified reasons. CO poisoning from portable electric generators occurs in a characteristic population, in a few typical locations and for a limited number of reasons. This information may help target prevention efforts for this form of poisoning, such as warning labels or educational programs.

  3. Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

    NASA Astrophysics Data System (ADS)

    Work, Paul R.

    1991-12-01

    This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.

  4. Parallel Implementation of a Frozen Flow Based Wavefront Reconstructor

    NASA Astrophysics Data System (ADS)

    Nagy, J.; Kelly, K.

    2013-09-01

    Obtaining high resolution images of space objects from ground based telescopes is challenging, often requiring the use of a multi-frame blind deconvolution (MFBD) algorithm to remove blur caused by atmospheric turbulence. In order for an MFBD algorithm to be effective, it is necessary to obtain a good initial estimate of the wavefront phase. Although wavefront sensors work well in low turbulence situations, they are less effective in high turbulence, such as when imaging in daylight, or when imaging objects that are close to the Earth's horizon. One promising approach, which has been shown to work very well in high turbulence settings, uses a frozen flow assumption on the atmosphere to capture the inherent temporal correlations present in consecutive frames of wavefront data. Exploiting these correlations can lead to more accurate estimation of the wavefront phase, and the associated PSF, which leads to more effective MFBD algorithms. However, with the current serial implementation, the approach can be prohibitively expensive in situations when it is necessary to use a large number of frames. In this poster we describe a parallel implementation that overcomes this constraint. The parallel implementation exploits sparse matrix computations, and uses the Trilinos package developed at Sandia National Laboratories. Trilinos provides a variety of core mathematical software for parallel architectures that have been designed using high quality software engineering practices, The package is open source, and portable to a variety of high-performance computing architectures.

  5. The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

    1997-01-01

    Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.

  6. What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.

  7. Bedform movement recorded by sequential single-beam surveys in tidal rivers

    USGS Publications Warehouse

    Dinehart, R.L.

    2002-01-01

    A portable system for bedform-mapping was evaluated in the delta of the lower Sacramento and San Joaquin Rivers, California, from 1998 to 2000. Bedform profiles were surveyed with a two-person crew using an array of four single-beam transducers on boats about 6 m in length. Methods for processing the bedform profiles into maps with geographic coordinates were developed for spreadsheet programs and surface-contouring software. Straight reaches were surveyed every few days or weeks to determine locations of sand deposition, net transport directions, flow thresholds for bedform regimes, and bedform-transport rates. In one channel of unidirectional flow, the portable system was used to record changes in bedform regime through minor fluctuations of low discharge, and through high discharges near channel capacity. In another channel with reversing flows from tides, the portable system recorded directions of net bedload-transport that would be undetectable by standard bedload sampling alone.

  8. A Portable Electronic Nose For Hydrazine and Monomethyl Hydrazine Detection

    NASA Technical Reports Server (NTRS)

    Young, Rebecca C.; Linnell, Bruce R.; Peterson, Barbara V.; Brooks, Kathy B.; Griffin, Tim P.

    2004-01-01

    The Space Program and military use large quantities Hydrazine (Hz) and monomethyl hydrazine (MMI-I) as rocket propellant. These substances are very toxic and are suspected human carcinogens. The American Conference of Governmental Industrial Hygienist set the threshold limit value to be 10 parts per billion (ppb). Current off-the-shelf portable instruments require 10 to 20 minutes of exposure to detect 10 ppb concentration. This shortcofriing is not acceptable for many operations. A new prototype instrument using a gas sensor array and pattern recognition software technology (i.e., an electronic nose) has demonstrated the ability to identify either Hz or MM}{ and quantify their concentrations at 10 parts per billion in 90 seconds. This paper describes the design of the portable electronic nose (e-nose) instrument, test equipment setup, test protocol, pattern recognition algorithm, concentration estimation method, and laboratory test results.

  9. Using an Interactive Computer Program to Communicate With the Wilderness Visitor

    Treesearch

    David W. Harmon

    1992-01-01

    The Bureau of Land Management, Oregon State Office, identified a need for a tool to communicate with wilderness visitors, managers, and decisionmakers regarding wilderness values and existing resource information in 87 wilderness study areas. An interactive computer program was developed using a portable Macintosh computer, a touch screen monitor, and laser disk player...

  10. 78 FR 50114 - Distribution of 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, and 2009 Satellite...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-16

    ... categories of copyrightable content (e.g., movies, music, and sports programming). At Phase II, the royalties... electronic copy in Portable Document Format (PDF) on a Compact Disc, along with the $150 filing fee, to the... request are: Joint Sports Claimants (JSC), Program Suppliers, Devotional Claimants, [[Page 50115...

  11. 76 FR 7590 - Distribution of 2000, 2001, 2002, and 2003 Cable Royalty Funds

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-02-10

    .... ADDRESSES: An original, five copies, and an electronic copy in Portable Document Format (PDF) on a CD of the... INFORMATION CONTACT: LaKeshia Brent, CRB Program Specialist, by telephone at (202) 707-7658, or e-mail at crb... of the major categories of copyrightable content (movies, sports programming, music, etc.) requesting...

  12. Recent Advances in High-Performance Direct Methanol Fuel Cells

    NASA Technical Reports Server (NTRS)

    Narayanan, S. R.; Chun, W.; Valdez, T. I.; Jeffries-Nakamura, B.; Frank, H.; Surumpudi, S.; Halpert, G.; Kosek, J.; Cropley, C.; La Conti, A. B.; hide

    1996-01-01

    Direct methanol fuel cells for portable power applications have been advanced significantly under DARPA- and ARO-sponsored programs over the last five years. A liquid-feed, direct methanol fuel cell developed under these programs, employs a proton exchange membrane as electrolyte and operates on aqueous solutions of methanol with air or oxygen as the oxidant.

  13. Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

    NASA Astrophysics Data System (ADS)

    Olson, Richard F.

    2013-05-01

    Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.

  14. Performance Evaluation in Network-Based Parallel Computing

    NASA Technical Reports Server (NTRS)

    Dezhgosha, Kamyar

    1996-01-01

    Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.

  15. WFIRST: Science from the Guest Investigator and Parallel Observation Programs

    NASA Astrophysics Data System (ADS)

    Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

    2018-01-01

    The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.

  16. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  17. Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

    DOE PAGES

    Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

    2015-01-01

    This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less

  18. A device-dependent interface for interactive image display

    NASA Technical Reports Server (NTRS)

    Perkins, D. C.; Szczur, M. R.; Owings, J.; Jamros, R. K.

    1984-01-01

    The structure of the device independent Display Management Subsystem (DMS) and the interface routines that are available to the applications programmer for use in developing a set of portable image display utility programs are described.

  19. Programming Probabilistic Structural Analysis for Parallel Processing Computer

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

    1991-01-01

    The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.

  20. Heart beats in the cloud: distributed analysis of electrophysiological ‘Big Data’ using cloud computing for epilepsy clinical research

    PubMed Central

    Sahoo, Satya S; Jayapandian, Catherine; Garg, Gaurav; Kaffashi, Farhad; Chung, Stephanie; Bozorgi, Alireza; Chen, Chien-Hun; Loparo, Kenneth; Lhatoo, Samden D; Zhang, Guo-Qiang

    2014-01-01

    Objective The rapidly growing volume of multimodal electrophysiological signal data is playing a critical role in patient care and clinical research across multiple disease domains, such as epilepsy and sleep medicine. To facilitate secondary use of these data, there is an urgent need to develop novel algorithms and informatics approaches using new cloud computing technologies as well as ontologies for collaborative multicenter studies. Materials and methods We present the Cloudwave platform, which (a) defines parallelized algorithms for computing cardiac measures using the MapReduce parallel programming framework, (b) supports real-time interaction with large volumes of electrophysiological signals, and (c) features signal visualization and querying functionalities using an ontology-driven web-based interface. Cloudwave is currently used in the multicenter National Institute of Neurological Diseases and Stroke (NINDS)-funded Prevention and Risk Identification of SUDEP (sudden unexplained death in epilepsy) Mortality (PRISM) project to identify risk factors for sudden death in epilepsy. Results Comparative evaluations of Cloudwave with traditional desktop approaches to compute cardiac measures (eg, QRS complexes, RR intervals, and instantaneous heart rate) on epilepsy patient data show one order of magnitude improvement for single-channel ECG data and 20 times improvement for four-channel ECG data. This enables Cloudwave to support real-time user interaction with signal data, which is semantically annotated with a novel epilepsy and seizure ontology. Discussion Data privacy is a critical issue in using cloud infrastructure, and cloud platforms, such as Amazon Web Services, offer features to support Health Insurance Portability and Accountability Act standards. Conclusion The Cloudwave platform is a new approach to leverage of large-scale electrophysiological data for advancing multicenter clinical research. PMID:24326538

  1. Heart beats in the cloud: distributed analysis of electrophysiological 'Big Data' using cloud computing for epilepsy clinical research.

    PubMed

    Sahoo, Satya S; Jayapandian, Catherine; Garg, Gaurav; Kaffashi, Farhad; Chung, Stephanie; Bozorgi, Alireza; Chen, Chien-Hun; Loparo, Kenneth; Lhatoo, Samden D; Zhang, Guo-Qiang

    2014-01-01

    The rapidly growing volume of multimodal electrophysiological signal data is playing a critical role in patient care and clinical research across multiple disease domains, such as epilepsy and sleep medicine. To facilitate secondary use of these data, there is an urgent need to develop novel algorithms and informatics approaches using new cloud computing technologies as well as ontologies for collaborative multicenter studies. We present the Cloudwave platform, which (a) defines parallelized algorithms for computing cardiac measures using the MapReduce parallel programming framework, (b) supports real-time interaction with large volumes of electrophysiological signals, and (c) features signal visualization and querying functionalities using an ontology-driven web-based interface. Cloudwave is currently used in the multicenter National Institute of Neurological Diseases and Stroke (NINDS)-funded Prevention and Risk Identification of SUDEP (sudden unexplained death in epilepsy) Mortality (PRISM) project to identify risk factors for sudden death in epilepsy. Comparative evaluations of Cloudwave with traditional desktop approaches to compute cardiac measures (eg, QRS complexes, RR intervals, and instantaneous heart rate) on epilepsy patient data show one order of magnitude improvement for single-channel ECG data and 20 times improvement for four-channel ECG data. This enables Cloudwave to support real-time user interaction with signal data, which is semantically annotated with a novel epilepsy and seizure ontology. Data privacy is a critical issue in using cloud infrastructure, and cloud platforms, such as Amazon Web Services, offer features to support Health Insurance Portability and Accountability Act standards. The Cloudwave platform is a new approach to leverage of large-scale electrophysiological data for advancing multicenter clinical research.

  2. Apollo: AN Automatic Procedure to Forecast Transport and Deposition of Tephra

    NASA Astrophysics Data System (ADS)

    Folch, A.; Costa, A.; Macedonio, G.

    2007-05-01

    Volcanic ash fallout represents a serious threat to communities around active volcanoes. Reliable short term predictions constitute a valuable support for to mitigate the effects of fallout on the surrounding area during an episode of crisis. We present a platform-independent automatic procedure aimed to daily forecast volcanic ash dispersal. The procedure builds on a series of programs and interfaces that allow an automatic data/results flow. Firstly the procedure downloads mesoscale meteorological forecasts for the region and period of interest, filters and converts data from its native format (typically GRIB format files), and sets up the CALMET diagnostic meteorological model to obtain hourly wind field and micro-meteorological variables on a finer mesh. Secondly a 1-D version of the buoyant plume equations assesses the distribution of mass along the eruptive column depending on the obtained wind field and on the conditions at the vent (granulometry, mass flow rate, etc.). All these data are used as input for the ash dispersion model(s). Any model able to face physical complexity and coupling processes with adequate solving times can be plugged into the system by means of an interface. Currently, the procedure contains the models HAZMAP, TEPHRA and FALL3D, the latter in both serial and parallel versions. Parallelization of FALL3D is done at two levels one for particle classes and one for spatial domain. The last step is to post-processes the model(s) outcomes to end up with homogeneous maps written on portable format files. Maps plot relevant quantities such as predicted ground load, expected deposit thickness or visual and flight safety concentration thresholds. Several applications are shown as examples.

  3. Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

    NASA Technical Reports Server (NTRS)

    Weeks, Cindy Lou

    1986-01-01

    Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.

  4. Performance Implications of Synchronization Support for Parallel FORTRAN Programs

    DTIC Science & Technology

    1991-06-17

    applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3

  5. Parallelization of Program to Optimize Simulated Trajectories (POST3D)

    NASA Technical Reports Server (NTRS)

    Hammond, Dana P.; Korte, John J. (Technical Monitor)

    2001-01-01

    This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

  6. StagBL : A Scalable, Portable, High-Performance Discretization and Solver Layer for Geodynamic Simulation

    NASA Astrophysics Data System (ADS)

    Sanan, P.; Tackley, P. J.; Gerya, T.; Kaus, B. J. P.; May, D.

    2017-12-01

    StagBL is an open-source parallel solver and discretization library for geodynamic simulation,encapsulating and optimizing operations essential to staggered-grid finite volume Stokes flow solvers.It provides a parallel staggered-grid abstraction with a high-level interface in C and Fortran.On top of this abstraction, tools are available to define boundary conditions and interact with particle systems.Tools and examples to efficiently solve Stokes systems defined on the grid are provided in small (direct solver), medium (simple preconditioners), and large (block factorization and multigrid) model regimes.By working directly with leading application codes (StagYY, I3ELVIS, and LaMEM) and providing an API and examples to integrate with others, StagBL aims to become a community tool supplying scalable, portable, reproducible performance toward novel science in regional- and planet-scale geodynamics and planetary science.By implementing kernels used by many research groups beneath a uniform abstraction layer, the library will enable optimization for modern hardware, thus reducing community barriers to large- or extreme-scale parallel simulation on modern architectures. In particular, the library will include CPU-, Manycore-, and GPU-optimized variants of matrix-free operators and multigrid components.The common layer provides a framework upon which to introduce innovative new tools.StagBL will leverage p4est to provide distributed adaptive meshes, and incorporate a multigrid convergence analysis tool.These options, in addition to a wealth of solver options provided by an interface to PETSc, will make the most modern solution techniques available from a common interface. StagBL in turn provides a PETSc interface, DMStag, to its central staggered grid abstraction.We present public version 0.5 of StagBL, including preliminary integration with application codes and demonstrations with its own demonstration application, StagBLDemo. Central to StagBL is the notion of an uninterrupted pipeline from toy/teaching codes to high-performance, extreme-scale solves. StagBLDemo replicates the functionality of an advanced MATLAB-style regional geodynamics code, thus providing users with a concrete procedure to exceed the performance and scalability limitations of smaller-scale tools.

  7. VEEP - Vehicle Economy, Emissions, and Performance program

    NASA Technical Reports Server (NTRS)

    Heimburger, D. A.; Metcalfe, M. A.

    1977-01-01

    VEEP is a general-purpose discrete event simulation program being developed to study the performance, fuel economy, and exhaust emissions of a vehicle modeled as a collection of its separate components. It is written in SIMSCRIPT II.5. The purpose of this paper is to present the design methodology, describe the simulation model and its components, and summarize the preliminary results. Topics include chief programmer team concepts, the SDDL design language, program portability, user-oriented design, the program's user command syntax, the simulation procedure, and model validation.

  8. Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

    PubMed

    Bellucci, Michael A; Coker, David F

    2011-07-28

    We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics

  9. Concepts of Concurrent Programming

    DTIC Science & Technology

    1990-04-01

    to the material presented. Carriero89 Carriero, N., and Gelernter, D. " How to Write Parallel Programs : A Guide to the Perplexed." ACM...between the architectures on which programs can be executed and the application domains from which problems are drawn. Our goal is to show how programs ...Sept. 1989), 251-510. Abstract: There are four papers: 1. Programming Languages for Distributed Computing Systems (52); 2. How to Write Parallel

  10. NavP: Structured and Multithreaded Distributed Parallel Programming

    NASA Technical Reports Server (NTRS)

    Pan, Lei; Xu, Jingling

    2006-01-01

    This slide presentation reviews some of the issues around distributed parallel programming. It compares and contrast two methods of programming: Single Program Multiple Data (SPMD) with the Navigational Programming (NAVP). It then reviews the distributed sequential computing (DSC) method and the methodology of NavP. Case studies are presented. It also reviews the work that is being done to enable the NavP system.

  11. High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

    NASA Technical Reports Server (NTRS)

    Simon, Horst D.; Saini, Subhash; Grassi, Charles

    1994-01-01

    The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.

  12. On program restructuring, scheduling, and communication for parallel processor systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Polychronopoulos, Constantine D.

    1986-08-01

    This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less

  13. SITE CHARACTERIZATION LIBRARY VERSION 3.0

    EPA Science Inventory

    The Site Characterization Library is a CD that provides a centralized, field-portable source for site characterization information. Version 3 of the Site Characterization Library contains additional (from earlier versions) electronic documents and computer programs related to th...

  14. 14 CFR 91.1067 - Initial and recurrent flight attendant crewmember testing requirements.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... emergency; (d) Briefing of passengers; (e) Location and operation of portable fire extinguishers and other... person to move rapidly to an exit in an emergency as prescribed by the program manager's operations...

  15. 14 CFR 91.1067 - Initial and recurrent flight attendant crewmember testing requirements.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... emergency; (d) Briefing of passengers; (e) Location and operation of portable fire extinguishers and other... person to move rapidly to an exit in an emergency as prescribed by the program manager's operations...

  16. PORTABLE TECHNOLOGIES FOR MEASURING LEADING IN DUST

    EPA Science Inventory

    The U.S. Environmental Protection Agency (EPA) Environmental Technology Verification (ETV) program evaluates the performance of innovative air, water, pollution prevention and monitoring technologies that have the potential to improve human health and the environment. This techn...

  17. Development open source microcontroller based temperature data logger

    NASA Astrophysics Data System (ADS)

    Abdullah, M. H.; Che Ghani, S. A.; Zaulkafilai, Z.; Tajuddin, S. N.

    2017-10-01

    This article discusses the development stages in designing, prototyping, testing and deploying a portable open source microcontroller based temperature data logger for use in rough industrial environment. The 5V powered prototype of data logger is equipped with open source Arduino microcontroller for integrating multiple thermocouple sensors with their module, secure digital (SD) card storage, liquid crystal display (LCD), real time clock and electronic enclosure made of acrylic. The program for the function of the datalogger is programmed so that 8 readings from the thermocouples can be acquired within 3 s interval and displayed on the LCD simultaneously. The recorded temperature readings at four different points on both hydrodistillation show similar profile pattern and highest yield of extracted oil was achieved on hydrodistillation 2 at 0.004%. From the obtained results, this study achieved the objective of developing an inexpensive, portable and robust eight channels temperature measuring module with capabilities to monitor and store real time data.

  18. Status of parallel Python-based implementation of UEDGE

    NASA Astrophysics Data System (ADS)

    Umansky, M. V.; Pankin, A. Y.; Rognlien, T. D.; Dimits, A. M.; Friedman, A.; Joseph, I.

    2017-10-01

    The tokamak edge transport code UEDGE has long used the code-development and run-time framework Basis. However, with the support for Basis expected to terminate in the coming years, and with the advent of the modern numerical language Python, it has become desirable to move UEDGE to Python, to ensure its long-term viability. Our new Python-based UEDGE implementation takes advantage of the portable build system developed for FACETS. The new implementation gives access to Python's graphical libraries and numerical packages for pre- and post-processing, and support of HDF5 simplifies exchanging data. The older serial version of UEDGE has used for time-stepping the Newton-Krylov solver NKSOL. The renovated implementation uses backward Euler discretization with nonlinear solvers from PETSc, which has the promise to significantly improve the UEDGE parallel performance. We will report on assessment of some of the extended UEDGE capabilities emerging in the new implementation, and will discuss the future directions. Work performed for U.S. DOE by LLNL under contract DE-AC52-07NA27344.

  19. Modelling parallel programs and multiprocessor architectures with AXE

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.

    1991-01-01

    AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.

  20. Web Based Parallel Programming Workshop for Undergraduate Education.

    ERIC Educational Resources Information Center

    Marcus, Robert L.; Robertson, Douglass

    Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…

  1. SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

    NASA Technical Reports Server (NTRS)

    Cooke, Daniel; Rushton, Nelson

    2013-01-01

    With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.

  2. Effect of Computer-Delivered Testing on Achievement in a Mastery Learning Course of Study with Partial Scoring and Variable Pacing.

    ERIC Educational Resources Information Center

    Evans, Richard M.; Surkan, Alvin J.

    The recent arrival of portable computer systems with high-level language interpreters now makes it practical to rapidly develop complex testing and scoring programs. These programs permit undergraduates access, at arbitrary times, to testing as an integral part of a mastery learning strategy. Effects of introducing the computer were studied by…

  3. Consumer server: A UNIX based event distributor in new CDF data acquisition system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abe, F.; Morita, Y.; Nomachi, M.

    1994-12-31

    Consumer Server is a program to handle event data and consumer trigger requests I/Os among Level 3 farm and consumer processes in CDF new data acquisition system. This program uses standard UNIX libraries and commercial network technologies to obtain higher portability. The authors describe the concept and configuration of the Consumer Server and report its performance.

  4. Instrumentation, performance visualization, and debugging tools for multiprocessors

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

    1991-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.

  5. Parallel computation with the force

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1985-01-01

    A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.

  6. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

    2003-01-01

    Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.

  7. 76 FR 62808 - Pilot Program for Parallel Review of Medical Products

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-11

    ... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...

  8. Neuropsychiatry and neuroscience education of psychiatry trainees: attitudes and barriers.

    PubMed

    Benjamin, Sheldon; Travis, Michael J; Cooper, Joseph J; Dickey, Chandlee C; Reardon, Claudia L

    2014-04-01

    The American Association of Directors of Psychiatric Residency Training (AADPRT) Task Force on Neuropsychiatry and Neuroscience Education of Psychiatry Residents was established in 2011 with the charge to seek information about what the field of psychiatry considers the core topics in neuropsychiatry and neuroscience to which psychiatry residents should be exposed; whether there are any "competencies" in this area on which the field agrees; whether psychiatry departments have the internal capacity to teach these topics if they are desirable; and what the reception would be for "portable curricula" in neuroscience. The task force reviewed the literature and developed a survey instrument to be administered nationwide to all psychiatry residency program directors. The AADPRT Executive Committee assisted with the survey review, and their feedback was incorporated into the final instrument. In 2011-2012, 226 adult and child and adolescent psychiatry residency program directors responded to the survey, representing over half of all US adult and child psychiatry training directors. About three quarters indicated that faculty resources were available in their departments but 39% felt the lack of neuropsychiatry faculty and 36% felt the absence of neuroscience faculty to be significant barriers. Respectively, 64 and 60% felt that neuropsychiatry and psychiatric neuroscience knowledge were very important or critically important to the provision of excellent care. Ninety-two percent were interested in access to portable neuroscience curricula. There is widespread agreement among training directors on the importance of neuropsychiatry and neuroscience knowledge to general psychiatrists but barriers to training exist, including some programs that lack faculty resources and a dearth of portable curricula in these areas.

  9. Algorithms and programming tools for image processing on the MPP

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1985-01-01

    Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.

  10. Execution models for mapping programs onto distributed memory parallel computers

    NASA Technical Reports Server (NTRS)

    Sussman, Alan

    1992-01-01

    The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.

  11. Program Correctness, Verification and Testing for Exascale (Corvette)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sen, Koushik; Iancu, Costin; Demmel, James W

    The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less

  12. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  13. Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.

  14. Trace-Driven Debugging of Message Passing Programs

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

    1998-01-01

    In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.

  15. Exploiting Symmetry on Parallel Architectures.

    NASA Astrophysics Data System (ADS)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  16. MODEST: A Tool for Geodesy and Astronomy

    NASA Technical Reports Server (NTRS)

    Sovers, Ojars J.; Jacobs, Christopher S.; Lanyi, Gabor E.

    2004-01-01

    Features of the JPL VLBI modeling and estimation software "MODEST" are reviewed. Its main advantages include thoroughly documented model physics, portability, and detailed error modeling. Two unique models are included: modeling of source structure and modeling of both spatial and temporal correlations in tropospheric delay noise. History of the code parallels the development of the astrometric and geodetic VLBI technique and the software retains many of the models implemented during its advancement. The code has been traceably maintained since the early 1980s, and will continue to be updated with recent IERS standards. Scripts are being developed to facilitate user-friendly data processing in the era of e-VLBI.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allan, Benjamin A.

    We report on the use and design of a portable, extensible performance data collection tool motivated by modeling needs of the high performance computing systems co-design com- munity. The lightweight performance data collectors with Eiger support is intended to be a tailorable tool, not a shrink-wrapped library product, as pro ling needs vary widely. A single code markup scheme is reported which, based on compilation ags, can send perfor- mance data from parallel applications to CSV les, to an Eiger mysql database, or (in a non-database environment) to at les for later merging and loading on a host with mysqlmore » available. The tool supports C, C++, and Fortran applications.« less

  18. Dynamic multistation photometer

    DOEpatents

    Bauer, Martin L.; Johnson, Wayne F.; Lakomy, Dale G.

    1977-01-01

    A portable fast analyzer is provided that uses a magnetic clutch/brake to rapidly accelerate the analyzer rotor, and employs a microprocessor for automatic analyzer operation. The rotor is held stationary while the drive motor is run up to speed. When it is desired to mix the sample(s) and reagent(s), the brake is deenergized and the clutch is energized wherein the rotor is very rapidly accelerated to the running speed. The parallel path rotor that is used allows the samples and reagents to be mixed the moment they are spun out into the rotor cuvetes and data acquisition begins immediately. The analyzer will thus have special utility for fast reactions.

  19. 40 CFR 204.55 - Requirements.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 26 2012-07-01 2011-07-01 true Requirements. 204.55 Section 204.55 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) NOISE ABATEMENT PROGRAMS NOISE EMISSION STANDARDS FOR CONSTRUCTION EQUIPMENT Portable Air Compressors § 204.55 Requirements. ...

  20. 40 CFR 204.55 - Requirements.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 26 2013-07-01 2013-07-01 false Requirements. 204.55 Section 204.55 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) NOISE ABATEMENT PROGRAMS NOISE EMISSION STANDARDS FOR CONSTRUCTION EQUIPMENT Portable Air Compressors § 204.55 Requirements. ...

  1. 40 CFR 204.55 - Requirements.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 25 2011-07-01 2011-07-01 false Requirements. 204.55 Section 204.55 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) NOISE ABATEMENT PROGRAMS NOISE EMISSION STANDARDS FOR CONSTRUCTION EQUIPMENT Portable Air Compressors § 204.55 Requirements. ...

  2. Technology Demonstration Summary, Chemfix Solidification/Stabilization Process, Clackamas, Oregon

    EPA Science Inventory

    ChemfIx's* patented stabilization/solidification technology was demonstrated at the Portable Equipment Salvage Company (PESC) site in Clackamas, Oregon, as part of the Superfund Innovative Technology Evaluation (SITE) program. The Chemfix process is designed to solidify and sta...

  3. Bringing Nanoscience into the K-12 Classroom

    ScienceCinema

    Dickerson, James; Camino, Fernando; Irwin, Edward

    2018-06-12

    Brookhaven Lab and a local school district collaborated to develop a nanotechnology program that brings students “into” labs at Brookhaven’s Center for Functional Nanomaterials through a portable videoconferencing system.

  4. Technology-aided leisure and communication opportunities for two post-coma persons emerged from a minimally conscious state and affected by multiple disabilities.

    PubMed

    Lancioni, Giulio E; O'Reilly, Mark F; Singh, Nirbhay N; Sigafoos, Jeff; Buonocunto, Francesca; Sacco, Valentina; Navarro, Jorge; Lanzilotti, Crocifissa; De Tommaso, Marina; Megna, Marisa; Oliva, Doretta

    2013-02-01

    This study assessed technology-aided programs for helping two post-coma persons, who had emerged from a minimally conscious state and were affected by multiple disabilities, to (a) engage with leisure stimuli and request caregiver's procedures, (b) send out and listen to text messages for communication with distant partners, and (c) combine leisure engagement and procedure requests with text messaging within the same sessions. The program for leisure engagement and procedure requests relied on the use of a portable computer with commercial software, and a microswitch for the participants' response. The program for text messaging communication involved the use of a portable computer, a GSM modem, a microswitch for the participants' response, and specifically developed software. Results indicated that the participants were successful at each of the three stages of the study, thus providing relevant evidence concerning performance achievements only minimally documented. The implications of the findings in terms of technology and practical opportunities for post-coma persons with multiple disabilities are discussed. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.

  6. ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Snyder, L.; Notkin, D.; Adams, L.

    1990-03-31

    This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less

  7. Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2002-01-01

    Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.

  8. The parallel programming of voluntary and reflexive saccades.

    PubMed

    Walker, Robin; McSorley, Eugene

    2006-06-01

    A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map.

  9. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.

  10. Applications of Java and Vector Graphics to Astrophysical Visualization

    NASA Astrophysics Data System (ADS)

    Edirisinghe, D.; Budiardja, R.; Chae, K.; Edirisinghe, G.; Lingerfelt, E.; Guidry, M.

    2002-12-01

    We describe a series of projects utilizing the portability of Java programming coupled with the compact nature of vector graphics (SVG and SWF formats) for setup and control of calculations, local and collaborative visualization, and interactive 2D and 3D animation presentations in astrophysics. Through a set of examples, we demonstrate how such an approach can allow efficient and user-friendly control of calculations in compiled languages such as Fortran 90 or C++ through portable graphical interfaces written in Java, and how the output of such calculations can be packaged in vector-based animation having interactive controls and extremely high visual quality, but very low bandwidth requirements.

  11. Analysis Results for Lunar Soil Simulant Using a Portable X-Ray Fluorescence Analyzer

    NASA Technical Reports Server (NTRS)

    Boothe, R. E.

    2006-01-01

    Lunar soil will potentially be used for oxygen generation, water generation, and as filler for building blocks during habitation missions on the Moon. NASA s in situ fabrication and repair program is evaluating portable technologies that can assess the chemistry of lunar soil and lunar soil simulants. This Technical Memorandum summarizes the results of the JSC 1 lunar soil simulant analysis using the TRACeR III IV handheld x-ray fluorescence analyzer, manufactured by KeyMaster Technologies, Inc. The focus of the evaluation was to determine how well the current instrument configuration would detect and quantify the components of JSC-1.

  12. 78 FR 76628 - Pilot Program for Parallel Review of Medical Products; Extension of the Duration of the Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-18

    ...The Food and Drug Administration (FDA) and the Centers for Medicare and Medicaid Services (CMS) (the Agencies) are announcing the extension of the ``Pilot Program for Parallel Review of Medical Products.'' The Agencies have decided to continue the program as currently designed for an additional period of 2 years from the date of publication of this notice.

  13. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  14. Integrated Task And Data Parallel Programming: Language Design

    NASA Technical Reports Server (NTRS)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  15. I/O Parallelization for the Goddard Earth Observing System Data Assimilation System (GEOS DAS)

    NASA Technical Reports Server (NTRS)

    Lucchesi, Rob; Sawyer, W.; Takacs, L. L.; Lyster, P.; Zero, J.

    1998-01-01

    The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center (GSFC) has developed the GEOS DAS, a data assimilation system that provides production support for NASA missions and will support NASA's Earth Observing System (EOS) in the coming years. The GEOS DAS will be used to provide background fields of meteorological quantities to EOS satellite instrument teams for use in their data algorithms as well as providing assimilated data sets for climate studies on decadal time scales. The DAO has been involved in prototyping parallel implementations of the GEOS DAS for a number of years and is now embarking on an effort to convert the production version from shared-memory parallelism to distributed-memory parallelism using the portable Message-Passing Interface (MPI). The GEOS DAS consists of two main components, an atmospheric General Circulation Model (GCM) and a Physical-space Statistical Analysis System (PSAS). The GCM operates on data that are stored on a regular grid while PSAS works with observational data that are scattered irregularly throughout the atmosphere. As a result, the two components have different data decompositions. The GCM is decomposed horizontally as a checkerboard with all vertical levels of each box existing on the same processing element(PE). The dynamical core of the GCM can also operate on a rotated grid, which requires communication-intensive grid transformations during GCM integration. PSAS groups observations on PEs in a more irregular and dynamic fashion.

  16. Automatic Management of Parallel and Distributed System Resources

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  17. The Effectiveness of Videotape Programs as a Communication Tool in the Small-Scale Livestock for Rural Farming Women Project, Honduras.

    ERIC Educational Resources Information Center

    Johnson-Dean, Lynn

    This study examines economic development in Third World countries and the use of portable video systems in development projects. The study, conducted in 1985, attempts to measure the level of effectiveness of videotape programs as a communication tool for training rural subsistence women in Honduras in technical aspects of pig-keeping. Classical…

  18. Development of New Instructional Uses of the Computer in an Inner-City Comprehensive High School. Final Report.

    ERIC Educational Resources Information Center

    Lichten, William

    A three-part program investigated the use of computers at an inner-city high school. An attempt was made to introduce a digital computer for instructional purposes at the high school. A single portable teletype terminal and a simple programing language, BASIC, were used. It was found that a wide variety of students could benefit from this…

  19. Analysis, modeling, and simulation (AMS) testbed development and evaluation to support dynamic mobility applications (DMA) and active transportation and demand management (ATDM) programs — evaluation report for ATDM program. [supporting datasets - Pasadena Testbed

    DOT National Transportation Integrated Search

    2017-07-26

    This zip file contains POSTDATA.ATT (.ATT); Print to File (.PRN); Portable Document Format (.PDF); and document (.DOCX) files of data to support FHWA-JPO-16-385, Analysis, modeling, and simulation (AMS) testbed development and evaluation to support d...

  20. Manufacturing Process Applications Team (MATeam)

    NASA Technical Reports Server (NTRS)

    1978-01-01

    The activities of the Manufacturing Process Applications Team concerning the promotion of joint Industry/Federal Agency/NASA funded research and technology operating plan (RTOP) programs are reported. Direct transfers occurred in cutting tools, laser wire stripping, soldering, and portable X-ray unit technology. TROP program funding approval was obtained for the further development of the cutting tool Sialon and development of an automated nondestructive fracture toughness testing system.

  1. Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

    NASA Astrophysics Data System (ADS)

    Amooie, M. A.; Moortgat, J.

    2017-12-01

    We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.

  2. Describing, using 'recognition cones'. [parallel-series model with English-like computer program

    NASA Technical Reports Server (NTRS)

    Uhr, L.

    1973-01-01

    A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.

  3. Development Specification for the Portable Life Support System (PLSS) Thermal Loop Pump

    NASA Technical Reports Server (NTRS)

    Anchondo, Ian; Campbell, Colin

    2017-01-01

    The AEMU Thermal Loop Pump Development Specification establishes the requirements for design, performance, and testing of the Water Pump as part of the Thermal System of the Advanced Portable Life Support System (PLSS). It is envisioned that the Thermal Loop Pump is a positive displacement pump that provides a repeatable volume of flow against a given range of back-pressures provided by the various applications. The intention is to operate the pump at a fixed speed for the given application. The primary system is made up of two identical and redundant pumps of which only one is in operation at given time. The Auxiliary Loop Pump is an identical pump design to the primary pumps but is operated at half the flow rate. Inlet positive pressure to the pumps is provided by the upstream Flexible Supply Assembly (FSA-431 and FSA-531) which are physically located inside the suit volume and pressurized by suit pressure. An integrated relief valve, placed in parallel to the pump's inlet and outlet protects the pump and loop from over-pressurization. An integrated course filter is placed upstream of the pump's inlet to provide filtration and prevent potential debris from damaging the pump.

  4. Optical Encoding Technology for Viral Screening Panels Final Report CRADA No TC02132.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lenhoff, R.; Haushalter, R.

    This was a collaborative effort between Lawrence Livermore National Security, LLC, Lawrence Livermore National Laboratory (LLNL) and Parallel Synthesis Technologies, Inc. (PSTI), to develop Optical Encoding Technology for Viral Screening Panels. The goal for this effort was to prepare a portable bead reader system that would enable the development of viral and bacterial screening panels which could be used for the detection of any desired set of bacteria or viruses in any location. The main objective was to determine if the combination of a bead-based, PCR suspension array technology, formulated from Parallume encoded beads and PSTI’s multiplex assay reader systemmore » (MARS), could provide advantages in terms of the number of simultaneously measured samples, portability, ruggedness, ease of use, accuracy, precision or cost as compared to the Luminexbased system developed at LLNL. The project underwent several no cost extensions however the overall goal of demonstrating the utility of this new system was achieved. As a result of the project a significant change to the type of bead PSTI used for the suspension system was implemented allowing better performance than the commercial Luminex system.« less

  5. 33 CFR 150.628 - How must the operator label, tag, and mark a container of hazardous material?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Workplace Safety and Health Hazard Communication Program § 150.628 How must the operator label, tag, and..., reactive and other special condition hazard warnings. The only exception is for portable containers that...

  6. Measurement of pheromone concentration using a portable electroantennogram

    Treesearch

    Kevin W. Thorpe; Alexei A. Sharov; Ksenia S. Tcheslavskaia

    2003-01-01

    Mating disruption is an increasingly important tactic against the gypsy moth in the United States. Since the full implementation of the federal Slow-the-Spread of the Gypsy Moth program in 2000, mating disruption has become the predominant method used.

  7. Small Portable Analyzer Diagnostic Equipment (SPADE) Program -- Diagnostic Software Validation

    DTIC Science & Technology

    1984-07-01

    Electronic Equipment Electromagnetic Emission and Susceptibility Requirements for the Control of Electromagnetic Interference Electromagnetic...ONLY. ORIENTATION OF DEFECT LOOKING HHO QIlILL: t -ed’-o· Significant efforts were expended to simulate spalling failures associated with naturally

  8. SITE EVALUATION OF FIELD PORTABLE PENTACHLOROPHENOL IMMUNOASSAYS

    EPA Science Inventory

    Four pentachlorophenol (PCP) enzyme immunoassays for environmental analysis have been evaluated through the U.S. EPA Superfund Innovative Technology Evaluation (SITE) program. Three assays were formatted for on-site field use and one assay could be used in a field laboratory sett...

  9. 32 CFR 701.122 - Medical records.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... THE NAVY DOCUMENTS AFFECTING THE PUBLIC DON Privacy Program § 701.122 Medical records. (a) Health Information Portability and Accountability Act (HIPAA). (1) DOD Directive 6025.18 establishes policies and assigns responsibilities for implementation of the standards for privacy of individually identifiable...

  10. 32 CFR 701.122 - Medical records.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... THE NAVY DOCUMENTS AFFECTING THE PUBLIC DON Privacy Program § 701.122 Medical records. (a) Health Information Portability and Accountability Act (HIPAA). (1) DOD Directive 6025.18 establishes policies and assigns responsibilities for implementation of the standards for privacy of individually identifiable...

  11. 10 CFR 474.2 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... ENERGY ENERGY CONSERVATION ELECTRIC AND HYBRID VEHICLE RESEARCH, DEVELOPMENT, AND DEMONSTRATION PROGRAM... storage batteries or other portable electrical energy storage devices, provided that: (1) Recharge energy... electrical energy required for an electric vehicle to travel one mile of the Highway Fuel Economy Driving...

  12. Eigensolver for a Sparse, Large Hermitian Matrix

    NASA Technical Reports Server (NTRS)

    Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

    2003-01-01

    A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).

  13. Parallelization of elliptic solver for solving 1D Boussinesq model

    NASA Astrophysics Data System (ADS)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  14. 3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

    1997-12-31

    The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less

  15. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2001-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  16. Relative Debugging of Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2002-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  17. Lunar surface magnetometers

    NASA Technical Reports Server (NTRS)

    Dyal, P.; Gordon, D. I.

    1973-01-01

    Discussion of the properties of both the stationary and portable magnetometers used in the Apollo program to measure static and dynamic fields on the lunar surface. A stationary magnetometer is described in which the three orthogonal vector components of the magnetic field are measured by three fluxgate sensors which are located at the ends of three orthogonal booms and contain ferromagnetic cores driven to saturation by means of a periodic current. In the Apollo 16 magnetometer special high-stability ring-core sensors were used which provided an output voltage to the analog-to-digital converter which is proportional to the magnetic field. A portable magnetometer is described which consists of a set of three orthogonal fluxgate sensors mounted on top of a tripod connected to an electronics box by a ribbon cable. The above-mentioned stationary magnetometer simultaneously measured the time-varying components of the field which were later subtracted from the portable magnetometer measurements to give the desired resultant steady field values caused by the magnetized crustal material.

  18. Use of a portable voice organizer to remember therapy goals in traumatic brain injury rehabilitation: a within-subjects trial.

    PubMed

    Hart, Tessa; Hawkey, Karen; Whyte, John

    2002-12-01

    To test the efficacy of a portable voice organizer in helping people with traumatic brain injury (TBI) to recall therapy goals and plans discussed with their clinical case managers. Prospective within-subjects trial, in which individualized therapy goals were randomly assigned to intervention or no intervention. Comprehensive postacute TBI rehabilitation program. Ten people with moderate to severe TBI enrolled from 3 months to 18 years after injury. Memory for therapy goals. Clinicians generated statements describing six current therapy goals, half of which were randomly assigned to be recorded on a voice organizer during the next case management session. Participants selected three times per day to listen to the recorded goals, prompted by an alarm. One-week recall was tested using both free- and cued-recall formats. Recorded goals were recalled better than unrecorded goals and appeared to be associated with better awareness or follow-through with therapy objectives. Portable electronic devices have the potential to assist with treatment areas beyond tasks involving prospective memory.

  19. Fundus Photography in the 21st Century--A Review of Recent Technological Advances and Their Implications for Worldwide Healthcare.

    PubMed

    Panwar, Nishtha; Huang, Philemon; Lee, Jiaying; Keane, Pearse A; Chuan, Tjin Swee; Richhariya, Ashutosh; Teoh, Stephen; Lim, Tock Han; Agrawal, Rupesh

    2016-03-01

    The introduction of fundus photography has impacted retinal imaging and retinal screening programs significantly. Fundus cameras play a vital role in addressing the cause of preventive blindness. More attention is being turned to developing countries, where infrastructure and access to healthcare are limited. One of the major limitations for tele-ophthalmology is restricted access to the office-based fundus camera. Recent advances in access to telecommunications coupled with introduction of portable cameras and smartphone-based fundus imaging systems have resulted in an exponential surge in available technologies for portable fundus photography. Retinal cameras in the near future would have to cater to these needs by featuring a low-cost, portable design with automated controls and digitalized images with Web-based transfer. In this review, we aim to highlight the advances of fundus photography for retinal screening as well as discuss the advantages, disadvantages, and implications of the various technologies that are currently available.

  20. Fundus Photography in the 21st Century—A Review of Recent Technological Advances and Their Implications for Worldwide Healthcare

    PubMed Central

    Panwar, Nishtha; Huang, Philemon; Lee, Jiaying; Keane, Pearse A.; Chuan, Tjin Swee; Richhariya, Ashutosh; Teoh, Stephen; Lim, Tock Han

    2016-01-01

    Abstract Background: The introduction of fundus photography has impacted retinal imaging and retinal screening programs significantly. Literature Review: Fundus cameras play a vital role in addressing the cause of preventive blindness. More attention is being turned to developing countries, where infrastructure and access to healthcare are limited. One of the major limitations for tele-ophthalmology is restricted access to the office-based fundus camera. Results: Recent advances in access to telecommunications coupled with introduction of portable cameras and smartphone-based fundus imaging systems have resulted in an exponential surge in available technologies for portable fundus photography. Retinal cameras in the near future would have to cater to these needs by featuring a low-cost, portable design with automated controls and digitalized images with Web-based transfer. Conclusions: In this review, we aim to highlight the advances of fundus photography for retinal screening as well as discuss the advantages, disadvantages, and implications of the various technologies that are currently available. PMID:26308281

  1. Blasim: A computational tool to assess ice impact damage on engine blades

    NASA Astrophysics Data System (ADS)

    Reddy, E. S.; Abumeri, G. H.; Chamis, C. C.

    1993-04-01

    A portable computer called BLASIM was developed at NASA LeRC to assess ice impact damage on aircraft engine blades. In addition to ice impact analyses, the code also contains static, dynamic, resonance margin, and supersonic flutter analysis capabilities. Solid, hollow, superhybrid, and composite blades are supported. An optional preprocessor (input generator) was also developed to interactively generate input for BLASIM. The blade geometry can be defined using a series of airfoils at discrete input stations or by a finite element grid. The code employs a coarse, fixed finite element mesh containing triangular plate finite elements to minimize program execution time. Ice piece is modeled using an equivalent spherical objective that has a high velocity opposite that of the aircraft and parallel to the engine axis. For local impact damage assessment, the impact load is considered as a distributed force acting over a region around the impact point. The average radial strain of the finite elements along the leading edge is used as a measure of the local damage. To estimate damage at the blade root, the impact is treated as an impulse and a combined stress failure criteria is employed. Parametric studies of local and root ice impact damage, and post-impact dynamics are discussed for solid and composite blades.

  2. Hominid visitation of the Moravian Karst during the Middle-Upper Paleolithic transition: New results from Pod Hradem Cave (Czech Republic).

    PubMed

    Nejman, L; Wood, R; Wright, D; Lisá, L; Nerudová, Z; Neruda, P; Přichystal, A; Svoboda, J

    2017-07-01

    In 1956-1958, excavations of Pod Hradem Cave in Moravia (eastern Czech Republic) revealed evidence for human activity during the Middle-Upper Paleolithic transition. This spanned 25,050-44,800 cal BP and contained artefacts attributed to the Aurignacian and Szeletian cultures, including those made from porcelanite (rarely used at Moravian Paleolithic sites). Coarse grained excavation techniques and major inversions in radiocarbon dates meant that site chronology could not be established adequately. This paper documents re-excavation of Pod Hradem in 2011-2012. A comprehensive AMS dating program using ultrafiltration and ABOx-SC pre-treatments provides new insights into human occupation at Pod Hradem Cave. Fine-grained excavation reveals sedimentary units spanning approximately 20,000 years of the Early Upper Paleolithic and late Middle Paleolithic periods, thus making it the first archaeological cave site in the Czech Republic with such a sedimentary and archaeological record. Recent excavation confirms infrequent human visitation, including during the Early Aurignacian by people who brought with them portable art objects that have no parallel in the Czech Republic. Raw material diversity of lithics suggests long-distance imports and ephemeral visits by highly mobile populations throughout the EUP period. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Paralex: An Environment for Parallel Programming in Distributed Systems

    DTIC Science & Technology

    1991-12-07

    distributed systems is coni- parable to assembly language programming for traditional sequential systems - the user must resort to low-level primitives ...to accomplish data encoding/decoding, communication, remote exe- cution, synchronization , failure detection and recovery. It is our belief that... synchronization . Finally, composing parallel programs by interconnecting se- quential computations allows automatic support for heterogeneity and fault tolerance

  4. Interfacing Computer Aided Parallelization and Performance Analysis

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

    2003-01-01

    When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.

  5. The Impact of IEEE-1076 on VHDL (Hardware Description Language)

    DTIC Science & Technology

    1988-12-01

    Portability simply refers to how machine-independent the language is defined. The Efficiency criterion looks at how fast a program compiles and how ...if a programming language is good. The evaluation was done to determine if IEEE Standard 1076-1987 was indeed a better version of VHDL than its...must be UNIX-based and be able to * use the tools that are common to that operating system such as lex [Lesk78I, yacc [John78] and the programming

  6. Research about Memory Detection Based on the Embedded Platform

    NASA Astrophysics Data System (ADS)

    Sun, Hao; Chu, Jian

    As is known to us all, the resources of memory detection of the embedded systems are very limited. Taking the Linux-based embedded arm as platform, this article puts forward two efficient memory detection technologies according to the characteristics of the embedded software. Especially for the programs which need specific libraries, the article puts forwards portable memory detection methods to help program designers to reduce human errors,improve programming quality and therefore make better use of the valuable embedded memory resource.

  7. An Object-Oriented Approach to Writing Computational Electromagnetics Codes

    NASA Technical Reports Server (NTRS)

    Zimmerman, Martin; Mallasch, Paul G.

    1996-01-01

    Presently, most computer software development in the Computational Electromagnetics (CEM) community employs the structured programming paradigm, particularly using the Fortran language. Other segments of the software community began switching to an Object-Oriented Programming (OOP) paradigm in recent years to help ease design and development of highly complex codes. This paper examines design of a time-domain numerical analysis CEM code using the OOP paradigm, comparing OOP code and structured programming code in terms of software maintenance, portability, flexibility, and speed.

  8. Optical Multi-Gas Monitor Technology Demonstration on the International Space Station

    NASA Technical Reports Server (NTRS)

    Pilgrim, Jeffrey S.; Wood, William R.; Casias, Miguel E.; Vakhtin, Andrei B.; Johnson, Michael D.; Mudgett, Paul D.

    2014-01-01

    The International Space Station (ISS) employs a suite of portable and permanently located gas monitors to insure crew health and safety. These sensors are tasked with functions ranging from fixed mass spectrometer based major constituents analysis to portable electrochemical sensor based combustion product monitoring. An all optical multigas sensor is being developed that can provide the specificity of a mass spectrometer with the portability of an electrochemical cell. The technology, developed under the Small Business Innovation Research program, allows for an architecture that is rugged, compact and low power. A four gas version called the Multi-Gas Monitor was launched to ISS in November 2013 aboard Soyuz and activated in February 2014. The portable instrument is comprised of a major constituents analyzer (water vapor, carbon dioxide, oxygen) and high dynamic range real-time ammonia sensor. All species are sensed inside the same enhanced path length optical cell with a separate vertical cavity surface emitting laser (VCSEL) targeted at each species. The prototype is controlled digitally with a field-programmable gate array/microcontroller architecture. The optical and electronic approaches are designed for scalability and future versions could add three important acid gases and carbon monoxide combustion product gases to the four species already sensed. Results obtained to date from the technology demonstration on ISS are presented and discussed.

  9. Improvement of portable computed tomography system for on-field applications

    NASA Astrophysics Data System (ADS)

    Sukrod, K.; Khoonkamjorn, P.; Tippayakul, C.

    2015-05-01

    In 2010, Thailand Institute of Nuclear Technology (TINT) received a portable Computed Tomography (CT) system from the IAEA as part of the Regional Cooperative Agreement (RCA) program. This portable CT system has been used as the prototype for development of portable CT system intended for industrial applications since then. This paper discusses the improvements in the attempt to utilize the CT system for on-field applications. The system is foreseen to visualize the amount of agarwood in the live tree trunk. The experiments adopting Am-241 as the radiation source were conducted. The Am-241 source was selected since it emits low energy gamma which should better distinguish small density differences of wood types. Test specimens made of timbers with different densities were prepared and used in the experiments. The cross sectional views of the test specimens were obtained from the CT system using different scanning parameters. It is found from the experiments that the results are promising as the picture can clearly differentiate wood types according to their densities. Also, the optimum scanning parameters were determined from the experiments. The results from this work encourage the research team to advance into the next phase which is to experiment with the real tree on the field.

  10. LDRD final report on massively-parallel linear programming : the parPCx system.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

    2005-02-01

    This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less

  11. Dual and parallel postdoctoral training programs: implications for the osteopathic medical profession.

    PubMed

    Burkhart, Diane N; Lischka, Terri A

    2011-04-01

    Students in colleges of osteopathic medicine have several options when considering postdoctoral training programs. In addition to training programs approved solely by the American Osteopathic Association or accredited solely by the Accreditation Council for Graduate Medical Education (ACGME), students can pursue programs accredited by both organizations (ie, dually accredited programs) or osteopathic programs that occur side-by-side with ACGME programs (ie, parallel programs). In the present article, we report on the availability and growth of these 2 training options and describe their benefits and drawbacks for trainees and the osteopathic medical profession as a whole.

  12. 40 CFR 204.55-3 - Configuration identification.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... PROGRAMS NOISE EMISSION STANDARDS FOR CONSTRUCTION EQUIPMENT Portable Air Compressors § 204.55-3... the following parameters: (1) The compressor type (screw, sliding vane, etc.). (2) Number of compressor stages. (3) Maximum pressure (psi). (4) Air intake system of compressor: (i) Number of filters...

  13. ENVIRONMENTAL TECHNOLOGY VERIFICATION REPORT - PORTABLE GAS CHROMATOGRAPH - PERKIN-ELMER PHOTOVAC, INC. VOYAGOR

    EPA Science Inventory

    The U.S Environmental Protection Agency, through the Environmental Technology Verification Program, is working to accelerate the acceptance and use of innovative technologies that improve the way the United States manages its environmental problems. Reports document the performa...

  14. ENVIRONMENTAL TECHNOLOGY VERIFICATION REPORT - PORTABLE GAS CHROMATOGRAPH - SENTEX SYSTEMS, INC. SCENTOGRAPH PLUS II

    EPA Science Inventory

    The U.S. Environmental Protection Agency, through the Environmental Technology Verification Program, is working to accelerate the acceptance and use of innovative technologies that improve the way the United States manages its environmental problems. This report documents demons...

  15. 78 FR 78939 - 36(b)(1) Arms Sales Notification

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-27

    ... Quantity or Quantities of Articles or Services under Consideration for Purchase: C-130J technical, engineering and software support; software updates and patches; familiarization training for Portable Flight... and contractor technical support services; and other related elements of logistics and program support...

  16. The paradigm compiler: Mapping a functional language for the connection machine

    NASA Technical Reports Server (NTRS)

    Dennis, Jack B.

    1989-01-01

    The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.

  17. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.

  18. Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.

  19. Exploiting loop level parallelism in nonprocedural dataflow programs

    NASA Technical Reports Server (NTRS)

    Gokhale, Maya B.

    1987-01-01

    Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.

  20. Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

    PubMed

    Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

    2014-10-30

    Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.

  1. Exploring types of play in an adapted robotics program for children with disabilities.

    PubMed

    Lindsay, Sally; Lam, Ashley

    2018-04-01

    Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.

  2. Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

    USDA-ARS?s Scientific Manuscript database

    This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...

  3. RAND Workshop on Antiproton Science and Technology, Annotated Executive Summary. (October 6-9, 1987)

    DTIC Science & Technology

    1988-10-01

    parity violation to condensed matter . A number of near-term important applications are possible using the source and portable storage devices...from charge parity violation studies to condensed matter studies. -vi - The CERN/LEAR facility will continue to only scratch the surface of important...technology programs. These technology programs include possible small tools to study extreme states of matter ;, a propulsion test facility for

  4. NRL Radar Division C++ Coding Standard

    DTIC Science & Technology

    2016-12-05

    The coding standard provides tools aimed at helping C++ programmers develop programs that are free of common types of errors, maintainable by...different programmers , portable to other operating systems, easy to read and understand, and have a consistent style. Questions of design, such as how to...mandatory for any organization with quality goals. The purpose of this standard is to provide tools aimed at helping C++ programmers develop programs that

  5. Graphic analysis of resources by numerical evaluation techniques (Garnet)

    USGS Publications Warehouse

    Olson, A.C.

    1977-01-01

    An interactive computer program for graphical analysis has been developed by the U.S. Geological Survey. The program embodies five goals, (1) economical use of computer resources, (2) simplicity for user applications, (3) interactive on-line use, (4) minimal core requirements, and (5) portability. It is designed to aid (1) the rapid analysis of point-located data, (2) structural mapping, and (3) estimation of area resources. ?? 1977.

  6. MPS Internships in Public Science Education: Sensing the Radio Sky

    NASA Astrophysics Data System (ADS)

    Blake, Melvin; Castelaz, M. W.; Moffett, D.; Walsh, L.; LaFratta, M.

    2006-12-01

    The intent of the “Sensing the Radio Sky” program is to teach high school students the concepts and relevance of radio astronomy through presentations in STARLAB portable planetariums. The two year program began in the summer of 2004 and was completed in December 2006. The program involved a team of 12 undergraduate physics and multimedia majors and four faculty mentors from Furman University, University of North Carolina-Asheville and Pisgah Astronomical Research Institute (PARI). One component of the program is the development and production of a projection cylinder for the portable STARLAB planetariums. The cylinder gives a thorough view of the Milky Way and of several other celestial sources in radio wavelengths, yet these images are difficult to perceive without prior knowledge of radio astronomy. Consequently, the Radio Sky team created a multimedia presentation to accompany the cylinder. This multimedia component contains six informative lessons on radio astronomy assembled by the physics interns and numerous illustrations and animations created by the multimedia interns. The cylinder and multimedia components complement each other and provide a unique, thorough, and highly intelligible perspective on radio astronomy. The final draft is complete and will be sent to Learning Technologies, Inc., for marketing to owners of STARLAB planetariums throughout the world. We acknowledge support from the NSF Internship in Public Science Education Program grant number 0324729.

  7. Educational intervention together with an on-line quality control program achieve recommended analytical goals for bedside blood glucose monitoring in a 1200-bed university hospital.

    PubMed

    Sánchez-Margalet, Víctor; Rodriguez-Oliva, Manuel; Sánchez-Pozo, Cristina; Fernández-Gallardo, María Francisca; Goberna, Raimundo

    2005-01-01

    Portable meters for blood glucose concentrations are used at the patients bedside, as well as by patients for self-monitoring of blood glucose. Even though most devices have important technological advances that decrease operator error, the analytical goals proposed for the performance of glucose meters have been recently changed by the American Diabetes Association (ADA) to reach <5% analytical error and <7.9% total error. We studied 80 meters throughout the Virgen Macarena Hospital and we found most devices with performance error higher than 10%. The aim of the present study was to establish a new system to control portable glucose meters together with an educational program for nurses in a 1200-bed University Hospital to achieve recommended analytical goals, so that we could improve the quality of diabetes care. We used portable glucose meters connected on-line to the laboratory after an educational program for nurses with responsibilities in point-of-care testing. We evaluated the system by assessing total error of the glucometers using high- and low-level glucose control solutions. In a period of 6 months, we collected data from 5642 control samples obtained by 14 devices (Precision PCx) directly from the control program (QC manager). The average total error for the low-level glucose control (2.77 mmol/l) was 6.3% (range 5.5-7.6%), and even lower for the high-level glucose control (16.66 mmol/l), at 4.8% (range 4.1-6.5%). In conclusion, the performance of glucose meters used in our University Hospital with more than 1000 beds not only improved after the intervention, but the meters achieved the analytical goals of the suggested ADA/National Academy of Clinical Biochemistry criteria for total error (<7.9% in the range 2.77-16.66 mmol/l glucose) and optimal total error for high glucose concentrations of <5%, which will improve the quality of care of our patients.

  8. Portable Cooler/Warmers

    NASA Technical Reports Server (NTRS)

    1994-01-01

    Early in the space program, NASA recognized the need to replace bulky coils, compressers, and motors for refrigeration purposes by looking at existing thermoelectric technology. This effort resulted in the development of miniaturized thermoelectric components and packaging to accommodate tight confines of spacecraft. Koolatron's portable electronic refrigerators incorporate this NASA technology. Each of the cooler/warmers employs one or two miniaturized thermoelectric modules. Although each module is only the size of a book of matches, it delivers the cooling power of a 10-pound block of ice. In some models, the cooler can be converted to a warmer. There are no moving parts. The Koolatrons can be plugged into auto cigarette lighters, recreational vehicles, boats or motel outlets.

  9. A Portable Electronic Nose For Toxic Vapor Detection, Identification, and Quantification

    NASA Technical Reports Server (NTRS)

    Linnell, B. R.; Young, R. C.; Griffin, T. P.; Meneghelli, B. J.; Peterson, B. V.; Brooks, K. B.

    2005-01-01

    A new prototype instrument based on electronic nose (e-nose) technology has demonstrated the ability to identify and quantify many vapors of interest to the Space Program at their minimum required concentrations for both single vapors and two-component vapor mixtures, and may easily be adapted to detect many other toxic vapors. To do this, it was necessary to develop algorithms to classify unknown vapors, recognize when a vapor is not any of the vapors of interest, and estimate the concentrations of the contaminants. This paper describes the design of the portable e-nose instrument, test equipment setup, test protocols, pattern recognition algorithms, concentration estimation methods, and laboratory test results.

  10. Rapidly deployable emergency communication system

    DOEpatents

    Gladden, Charles A.; Parelman, Martin H.

    1979-01-01

    A highly versatile, highly portable emergency communication system which permits deployment in a very short time to cover both wide areas and distant isolated areas depending upon mission requirements. The system employs a plurality of lightweight, fully self-contained repeaters which are deployed within the mission area to provide communication between field teams, and between each field team and a mobile communication control center. Each repeater contains a microcomputer controller, the program for which may be changed from the control center by the transmission of digital data within the audible range (300-3,000 Hz). Repeaters are accessed by portable/mobile transceivers, other repeaters, and the control center through the transmission and recognition of digital data code words in the subaudible range.

  11. Multiprocessor smalltalk: Implementation, performance, and analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pallas, J.I.

    1990-01-01

    Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less

  12. 42 CFR 410.33 - Independent diagnostic testing facility.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... SERVICES MEDICARE PROGRAM SUPPLEMENTARY MEDICAL INSURANCE (SMI) BENEFITS Medical and Other Health Services... supplier of portable x-ray services, a nurse practitioner, or a clinical nurse specialist when he or she... electrophysiologic clinical specialist and permitted to provide the service under State law. (b) Supervising...

  13. 42 CFR 410.33 - Independent diagnostic testing facility.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... SERVICES MEDICARE PROGRAM SUPPLEMENTARY MEDICAL INSURANCE (SMI) BENEFITS Medical and Other Health Services... supplier of portable x-ray services, a nurse practitioner, or a clinical nurse specialist when he or she... electrophysiologic clinical specialist and permitted to provide the service under State law. (b) Supervising...

  14. 42 CFR 410.33 - Independent diagnostic testing facility.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... SERVICES MEDICARE PROGRAM SUPPLEMENTARY MEDICAL INSURANCE (SMI) BENEFITS Medical and Other Health Services... supplier of portable x-ray services, a nurse practitioner, or a clinical nurse specialist when he or she... electrophysiologic clinical specialist and permitted to provide the service under State law. (b) Supervising...

  15. Carpentry Specialist.

    ERIC Educational Resources Information Center

    Air Force Training Command, Sheppard AFB, TX.

    This instructional package is intended for use in training Air Force personnel enrolled in a program for apprentice carpenters. Training includes an introduction to carpentry and provides instruction in the use of carpentry hand, portable power, and shop tools; construction and maintenance of wood structures; installation of building hardware; and…

  16. ENVIRONMENTAL TECHNOLOGY VERIFICATION REPORT: CONSTELLATION TECHNOLOGY CORPORATION - CT-1128 PORTABLE GAS CHROMATOGRAPH-MASS SPECTROMETER

    EPA Science Inventory

    The Environmental Technology Verification (ETV) Program, beginning as an initiative of the U.S. Environmental Protection Agency (EPA) in 1995, verifies the performance of commercially available, innovative technologies that can be used to measure environmental quality. The ETV p...

  17. 40 CFR 165.25 - Nonrefillable container standards.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... PROGRAMS PESTICIDE MANAGEMENT AND DISPOSAL Nonrefillable Container Standards: Container Design and Residue... 171.8 must be packaged in a nonrefillable container that, if portable, is designed, constructed, and... are applicable to a Packing Group III material, or, if subject to a special permit, according to the...

  18. 40 CFR 165.25 - Nonrefillable container standards.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... PROGRAMS PESTICIDE MANAGEMENT AND DISPOSAL Nonrefillable Container Standards: Container Design and Residue... 171.8 must be packaged in a nonrefillable container that, if portable, is designed, constructed, and... are applicable to a Packing Group III material, or, if subject to a special permit, according to the...

  19. 40 CFR 165.25 - Nonrefillable container standards.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... PROGRAMS PESTICIDE MANAGEMENT AND DISPOSAL Nonrefillable Container Standards: Container Design and Residue... 171.8 must be packaged in a nonrefillable container that, if portable, is designed, constructed, and... are applicable to a Packing Group III material, or, if subject to a special permit, according to the...

  20. 40 CFR 165.25 - Nonrefillable container standards.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... PROGRAMS PESTICIDE MANAGEMENT AND DISPOSAL Nonrefillable Container Standards: Container Design and Residue... 171.8 must be packaged in a nonrefillable container that, if portable, is designed, constructed, and... are applicable to a Packing Group III material, or, if subject to a special permit, according to the...

Top