pure parallel program: Topics by Science.gov

Sample records for pure parallel program

Augmenting The HST Pure Parallel Observations

NASA Astrophysics Data System (ADS)

Patterson, Alan; Soutchkova, G.; Workman, W.

2012-05-01

Pure Parallel (PP) programs, designated GO/PAR, are a subgroup of General Observer (GO) programs. PP execute simultaneously with prime GO observations to which they are "attached". The PP observations can be performed with ACS/WFC, WFC3/UVIS or WFC3/IR and can be attached only to GO visits in which the instruments are either COS or STIS. The current HST Parallel Observation Processing System (POPS) was introduced after the Servicing Mission 4. It increased the HST productivity by 10% in terms of the utilization of HST prime orbits and was highly appreciated by the HST observers, allowing them to design efficient, multi-orbit survey projects for collecting large amounts of data on identifiable targets. The results of the WFC3 Infrared Spectroscopic Parallel Survey (WISP), Hubble Infrared Pure Parallel Imaging Extragalactic Survey (HIPPIES), and The Brightest-of-Reionizing Galaxies Pure Parallel Survey (BoRG) exemplify this benefit. In Cycle 19, however, the full advantage of GO/PARs came under risk. Whereas each of the previous cycles provided over one million seconds of exposure time for PP, in Cycle 19 that number reduced to 680,000 seconds. This dramatic decline occurred because of fundamental changes in the construction of COS prime observations. To preserve the science output of PP, the PP Working Group was tasked to find a way to recover the lost time and maximize the total time available for PP observing. The solution was to expand the definition of a PP opportunity to allow PP exposures to span one or more primary exposure readouts. So starting in HST Cycle 20, PP opportunities will no longer be limited to GO visits with a single uninterrupted exposure in an orbit. The resulting enhancements in HST Cycle 20 to the PP opportunity identification and matching process are expected to restore the PP time to previously achieved and possibly even greater levels.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
chemf: A purely functional chemistry toolkit.

PubMed

Höck, Stefan; Riedl, Rainer

2012-12-20

Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project.
chemf: A purely functional chemistry toolkit

PubMed Central

2012-01-01

Background Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. Results We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. Conclusions We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project. PMID:23253942
PCSIM: A Parallel Simulation Environment for Neural Circuits Fully Integrated with Python

PubMed Central

Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus

2008-01-01

The Parallel Circuit SIMulator (PCSIM) is a software package for simulation of neural circuits. It is primarily designed for distributed simulation of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit simulator with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural simulations. PMID:19543450
Message Passing and Shared Address Space Parallelism on an SMP Cluster

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
An engineering approach to automatic programming

NASA Technical Reports Server (NTRS)

Rubin, Stuart H.

1990-01-01

An exploratory study of the automatic generation and optimization of symbolic programs using DECOM - a prototypical requirement specification model implemented in pure LISP was undertaken. It was concluded, on the basis of this study, that symbolic processing languages such as LISP can support a style of programming based upon formal transformation and dependent upon the expression of constraints in an object-oriented environment. Such languages can represent all aspects of the software generation process (including heuristic algorithms for effecting parallel search) as dynamic processes since data and program are represented in a uniform format.
Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization

NASA Astrophysics Data System (ADS)

Berg, Bernd A.; Wu, Hao

2012-10-01

We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11.5. Nature of problem: Physics of pure SU(3) Quantum Field Theory (QFT). This is relevant for our understanding of Quantum Chromodynamics (QCD). It includes the glueball spectrum, topological properties and the deconfining phase transition of pure SU(3) QFT. For instance, Relativistic Heavy Ion Collision (RHIC) experiments at the Brookhaven National Laboratory provide evidence that quarks confined in hadrons undergo at high enough temperature and pressure a transition into a Quark-Gluon Plasma (QGP). Investigations of its thermodynamics in pure SU(3) QFT are of interest. Solution method: Markov Chain Monte Carlo (MCMC) simulations of SU(3) Lattice Gauge Theory (LGT) with the Wilson action. This is a regularization of pure SU(3) QFT on a hypercubic lattice, which allows approaching the continuum SU(3) QFT by means of Finite Size Scaling (FSS) studies. Specifically, we provide updating routines for the Cabibbo-Marinari heatbath with and without checkerboard parallelization. While the first is suitable for pedagogical purposes and small scale projects, the latter allows for efficient parallel processing. Targetting the geometry of RHIC experiments, we have implemented a Double-Layered Torus (DLT) lattice geometry, which has previously not been used in LGT MCMC simulations and enables inside and outside layers at distinct temperatures, the lower-temperature layer acting as the outside boundary for the higher-temperature layer, where the deconfinement transition goes on. Restrictions: The checkerboard partition of the lattice makes the development of measurement programs more tedious than is the case for an unpartitioned lattice. Presently, only one measurement routine for Polyakov loops is provided. Unusual features: We provide three different versions for the send/receive function of the MPI library, which work for different operating system +compiler +MPI combinations. This involves activating the correct row in the last three rows of our latmpi.par parameter file. The underlying reason is distinct buffer conventions. Running time: For a typical run using an Intel i7 processor, it takes (1.8-6) E-06 seconds to update one link of the lattice, depending on the compiler used. For example, if we do a simulation on a small (4 * 83) DLT lattice with a statistics of 221 sweeps (i.e., update the two lattice layers of 4 * (4 * 83) links each 221 times), the total CPU time needed can be 2 * 4 * (4 * 83) * 221 * 3 E-06 seconds = 1.7 minutes, where 2 — two layers of lattice 4 — four dimensions 83 * 4 — lattice size 221 — sweeps of updating 6 E-06 s mdash; average time to update one link variable. If we divide the job into 8 parallel processes, then the real time is (for negligible communication overhead) 1.7 mins / 8 = 0.2 mins.
Real-Time Monitoring of Scada Based Control System for Filling Process

NASA Astrophysics Data System (ADS)

Soe, Aung Kyaw; Myint, Aung Naing; Latt, Maung Maung; Theingi

2008-10-01

This paper is a design of real-time monitoring for filling system using Supervisory Control and Data Acquisition (SCADA). The monitoring of production process is described in real-time using Visual Basic.Net programming under Visual Studio 2005 software without SCADA software. The software integrators are programmed to get the required information for the configuration screens. Simulation of components is expressed on the computer screen using parallel port between computers and filling devices. The programs of real-time simulation for the filling process from the pure drinking water industry are provided.
Performance enhancement of various real-time image processing techniques via speculative execution

NASA Astrophysics Data System (ADS)

Younis, Mohamed F.; Sinha, Purnendu; Marlowe, Thomas J.; Stoyenko, Alexander D.

1996-03-01

In real-time image processing, an application must satisfy a set of timing constraints while ensuring the semantic correctness of the system. Because of the natural structure of digital data, pure data and task parallelism have been used extensively in real-time image processing to accelerate the handling time of image data. These types of parallelism are based on splitting the execution load performed by a single processor across multiple nodes. However, execution of all parallel threads is mandatory for correctness of the algorithm. On the other hand, speculative execution is an optimistic execution of part(s) of the program based on assumptions on program control flow or variable values. Rollback may be required if the assumptions turn out to be invalid. Speculative execution can enhance average, and sometimes worst-case, execution time. In this paper, we target various image processing techniques to investigate applicability of speculative execution. We identify opportunities for safe and profitable speculative execution in image compression, edge detection, morphological filters, and blob recognition.
Algorithmic synthesis using Python compiler

NASA Astrophysics Data System (ADS)

Cieszewski, Radoslaw; Romaniuk, Ryszard; Pozniak, Krzysztof; Linczuk, Maciej

2015-09-01

This paper presents a python to VHDL compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and translate it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the programmed circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. This can be achieved by using many computational resources at the same time. Creating parallel programs implemented in FPGAs in pure HDL is difficult and time consuming. Using higher level of abstraction and High-Level Synthesis compiler implementation time can be reduced. The compiler has been implemented using the Python language. This article describes design, implementation and results of created tools.
[Not Available].

PubMed

Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus

2009-01-01

The Parallel Circuit SIMulator (PCSIM) is a software package for simulation of neural circuits. It is primarily designed for distributed simulation of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit simulator with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural simulations.
Research in Parallel Computing: 1987-1990

DTIC Science & Technology

1994-08-05

emulation, we layered UNIX BSD 4.3 functionality above the kernel primitives, but packaged both as a monolithic unit running in privileged state. This...further, so that only a "pure kernel " or " microkernel " runs in privileged mode, while the other components of the environment execute as one or more client... kernel DTIC TAB 24 2.2.2 Nectar’s communication software Unannounced 0 25 2.2.3 A Nectar programming interface Justification 25 2.3 System evaluation 26
RPython high-level synthesis

NASA Astrophysics Data System (ADS)

Cieszewski, Radoslaw; Linczuk, Maciej

2016-09-01

The development of FPGA technology and the increasing complexity of applications in recent decades have forced compilers to move to higher abstraction levels. Compilers interprets an algorithmic description of a desired behavior written in High-Level Languages (HLLs) and translate it to Hardware Description Languages (HDLs). This paper presents a RPython based High-Level synthesis (HLS) compiler. The compiler get the configuration parameters and map RPython program to VHDL. Then, VHDL code can be used to program FPGA chips. In comparison of other technologies usage, FPGAs have the potential to achieve far greater performance than software as a result of omitting the fetch-decode-execute operations of General Purpose Processors (GPUs), and introduce more parallel computation. This can be exploited by utilizing many resources at the same time. Creating parallel algorithms computed with FPGAs in pure HDL is difficult and time consuming. Implementation time can be greatly reduced with High-Level Synthesis compiler. This article describes design methodologies and tools, implementation and first results of created VHDL backend for RPython compiler.
FX-87 performance measurements: data-flow implementation. Technical report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hammel, R.T.; Gifford, D.K.

1988-11-01

This report documents a series of experiments performed to explore the thesis that the FX-87 effect system permits a compiler to schedule imperative programs (i.e., programs that may contain side-effects) for execution on a parallel computer. The authors analyze how much the FX-87 static effect system can improve the execution times of five benchmark programs on a parallel graph interpreter. Three of their benchmark programs do not use side-effects (factorial, fibonacci, and polynomial division) and thus did not have any effect-induced constraints. Their FX-87 performance was comparable to their performance in a purely functional language. Two of the benchmark programsmore » use side effects (DNA sequence matching and Scheme interpretation) and the compiler was able to use effect information to reduce their execution times by factors of 1.7 to 5.4 when compared with sequential execution times. These results support the thesis that a static effect system is a powerful tool for compilation to multiprocessor computers. However, the graph interpreter we used was based on unrealistic assumptions, and thus our results may not accurately reflect the performance of a practical FX-87 implementation. The results also suggest that conventional loop analysis would complement the FX-87 effect system« less
Functional Programming in Computer Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Loren James; Davis, Marion Kei

We explore functional programming through a 16-week internship at Los Alamos National Laboratory. Functional programming is a branch of computer science that has exploded in popularity over the past decade due to its high-level syntax, ease of parallelization, and abundant applications. First, we summarize functional programming by listing the advantages of functional programming languages over the usual imperative languages, and we introduce the concept of parsing. Second, we discuss the importance of lambda calculus in the theory of functional programming. Lambda calculus was invented by Alonzo Church in the 1930s to formalize the concept of effective computability, and every functionalmore » language is essentially some implementation of lambda calculus. Finally, we display the lasting products of the internship: additions to a compiler and runtime system for the pure functional language STG, including both a set of tests that indicate the validity of updates to the compiler and a compiler pass that checks for illegal instances of duplicate names.« less
Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
ms2: A molecular simulation tool for thermodynamic properties

NASA Astrophysics Data System (ADS)

Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

2011-11-01

This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
: A Scalable and Transparent System for Simulating MPI Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2010-01-01

is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
Two-dimensional parallel array technology as a new approach to automated combinatorial solid-phase organic synthesis

PubMed

Brennan; Biddison; Frauendorf; Schwarcz; Keen; Ecker; Davis; Tinder; Swayze

1998-01-01

An automated, 96-well parallel array synthesizer for solid-phase organic synthesis has been designed and constructed. The instrument employs a unique reagent array delivery format, in which each reagent utilized has a dedicated plumbing system. An inert atmosphere is maintained during all phases of a synthesis, and temperature can be controlled via a thermal transfer plate which holds the injection molded reaction block. The reaction plate assembly slides in the X-axis direction, while eight nozzle blocks holding the reagent lines slide in the Y-axis direction, allowing for the extremely rapid delivery of any of 64 reagents to 96 wells. In addition, there are six banks of fixed nozzle blocks, which deliver the same reagent or solvent to eight wells at once, for a total of 72 possible reagents. The instrument is controlled by software which allows the straightforward programming of the synthesis of a larger number of compounds. This is accomplished by supplying a general synthetic procedure in the form of a command file, which calls upon certain reagents to be added to specific wells via lookup in a sequence file. The bottle position, flow rate, and concentration of each reagent is stored in a separate reagent table file. To demonstrate the utility of the parallel array synthesizer, a small combinatorial library of hydroxamic acids was prepared in high throughput mode for biological screening. Approximately 1300 compounds were prepared on a 10 μmole scale (3-5 mg) in a few weeks. The resulting crude compounds were generally >80% pure, and were utilized directly for high throughput screening in antibacterial assays. Several active wells were found, and the activity was verified by solution-phase synthesis of analytically pure material, indicating that the system described herein is an efficient means for the parallel synthesis of compounds for lead discovery. Copyright 1998 John Wiley & Sons, Inc.

Study of high-performance canonical molecular orbitals calculation for proteins

NASA Astrophysics Data System (ADS)

Hirano, Toshiyuki; Sato, Fumitoshi

2017-11-01

The canonical molecular orbital (CMO) calculation can help to understand chemical properties and reactions in proteins. However, it is difficult to perform the CMO calculation of proteins because of its self-consistent field (SCF) convergence problem and expensive computational cost. To certainly obtain the CMO of proteins, we work in research and development of high-performance CMO applications and perform experimental studies. We have proposed the third-generation density-functional calculation method of calculating the SCF, which is more advanced than the FILE and direct method. Our method is based on Cholesky decomposition for two-electron integrals calculation and the modified grid-free method for the pure-XC term evaluation. By using the third-generation density-functional calculation method, the Coulomb, the Fock-exchange, and the pure-XC terms can be given by simple linear algebraic procedure in the SCF loop. Therefore, we can expect to get a good parallel performance in solving the SCF problem by using a well-optimized linear algebra library such as BLAS on the distributed memory parallel computers. The third-generation density-functional calculation method is implemented to our program, ProteinDF. To achieve computing electronic structure of the large molecule, not only overcoming expensive computation cost and also good initial guess for safe SCF convergence are required. In order to prepare a precise initial guess for the macromolecular system, we have developed the quasi-canonical localized orbital (QCLO) method. The QCLO has the characteristics of both localized and canonical orbital in a certain region of the molecule. We have succeeded in the CMO calculations of proteins by using the QCLO method. For simplified and semi-automated calculation of the QCLO method, we have also developed a Python-based program, QCLObot.
Structural modeling of carbonaceous mesophase amphotropic mixtures under uniaxial extensional flow.

PubMed

Golmohammadi, Mojdeh; Rey, Alejandro D

2010-07-21

The extended Maier-Saupe model for binary mixtures of model carbonaceous mesophases (uniaxial discotic nematogens) under externally imposed flow, formulated in previous studies [M. Golmohammadi and A. D. Rey, Liquid Crystals 36, 75 (2009); M. Golmohammadi and A. D. Rey, Entropy 10, 183 (2008)], is used to characterize the effect of uniaxial extensional flow and concentration on phase behavior and structure of these mesogenic blends. The generic thermorheological phase diagram of the single-phase binary mixture, given in terms of temperature (T) and Deborah (De) number, shows the existence of four T-De transition lines that define regions that correspond to the following quadrupolar tensor order parameter structures: (i) oblate (perpendicular, parallel), (ii) prolate (perpendicular, parallel), (iii) scalene O(perpendicular, parallel), and (iv) scalene P(perpendicular, parallel), where the symbols (perpendicular, parallel) indicate alignment of the tensor order ellipsoid with respect to the extension axis. It is found that with increasing T the dominant component of the mixture exhibits weak deviations from the well-known pure species response to uniaxial extensional flow (uniaxial perpendicular nematic-->biaxial nematic-->uniaxial parallel paranematic). In contrast, the slaved component shows a strong deviation from the pure species response. This deviation is dictated by the asymmetric viscoelastic coupling effects emanating from the dominant component. Changes in conformation (oblate <==> prolate) and orientation (perpendicular <==> parallel) are effected through changes in pairs of eigenvalues of the quadrupolar tensor order parameter. The complexity of the structural sensitivity to temperature and extensional flow is a reflection of the dual lyotropic/thermotropic nature (amphotropic nature) of the mixture and their cooperation/competition. The analysis demonstrates that the simple structures (biaxial nematic and uniaxial paranematic) observed in pure discotic mesogens under uniaxial extensional flow are significantly enriched by the interaction of the lyotropic/thermotropic competition with the binary molecular architectures and with the quadrupolar nature of the flow.
Comparison of adult age differences in verbal and visuo-spatial memory: the importance of 'pure', parallel and validated measures.

PubMed

Kemps, Eva; Newson, Rachel

2006-04-01

The study compared age-related decrements in verbal and visuo-spatial memory across a broad elderly adult age range. Twenty-four young (18-25 years), 24 young-old (65-74 years), 24 middle-old (75-84 years) and 24 old-old (85-93 years) adults completed parallel recall and recognition measures of verbal and visuo-spatial memory from the Doors and People Test (Baddeley, Emslie & Nimmo-Smith, 1994). These constituted 'pure' and validated indices of either verbal or visuo-spatial memory. Verbal and visuo-spatial memory declined similarly with age, with a steeper decline in recall than recognition. Unlike recognition memory, recall performance also showed a heightened decline after the age of 85. Age-associated memory loss in both modalities was largely due to working memory and executive function. Processing speed and sensory functioning (vision, hearing) made minor contributions to memory performance and age differences in it. Together, these findings demonstrate common, rather than differential, age-related effects on verbal and visuo-spatial memory. They also emphasize the importance of using 'pure', parallel and validated measures of verbal and visuo-spatial memory in memory ageing research.
Neurite, a Finite Difference Large Scale Parallel Program for the Simulation of Electrical Signal Propagation in Neurites under Mechanical Loading

PubMed Central

García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine

2015-01-01

With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted. PMID:25680098
ZENO: N-body and SPH Simulation Codes

NASA Astrophysics Data System (ADS)

Barnes, Joshua E.

2011-02-01

The ZENO software package integrates N-body and SPH simulation codes with a large array of programs to generate initial conditions and analyze numerical simulations. Written in C, the ZENO system is portable between Mac, Linux, and Unix platforms. It is in active use at the Institute for Astronomy (IfA), at NRAO, and possibly elsewhere. Zeno programs can perform a wide range of simulation and analysis tasks. While many of these programs were first created for specific projects, they embody algorithms of general applicability and embrace a modular design strategy, so existing code is easily applied to new tasks. Major elements of the system include: Structured data file utilities facilitate basic operations on binary data, including import/export of ZENO data to other systems.Snapshot generation routines create particle distributions with various properties. Systems with user-specified density profiles can be realized in collisionless or gaseous form; multiple spherical and disk components may be set up in mutual equilibrium.Snapshot manipulation routines permit the user to sift, sort, and combine particle arrays, translate and rotate particle configurations, and assign new values to data fields associated with each particle.Simulation codes include both pure N-body and combined N-body/SPH programs: Pure N-body codes are available in both uniprocessor and parallel versions.SPH codes offer a wide range of options for gas physics, including isothermal, adiabatic, and radiating models. Snapshot analysis programs calculate temporal averages, evaluate particle statistics, measure shapes and density profiles, compute kinematic properties, and identify and track objects in particle distributions.Visualization programs generate interactive displays and produce still images and videos of particle distributions; the user may specify arbitrary color schemes and viewing transformations.
Pythran: enabling static optimization of scientific Python programs

NASA Astrophysics Data System (ADS)

Guelton, Serge; Brunet, Pierrick; Amini, Mehdi; Merlini, Adrien; Corbillon, Xavier; Raynaud, Alan

2015-01-01

Pythran is an open source static compiler that turns modules written in a subset of Python language into native ones. Assuming that scientific modules do not rely much on the dynamic features of the language, it trades them for powerful, possibly inter-procedural, optimizations. These optimizations include detection of pure functions, temporary allocation removal, constant folding, Numpy ufunc fusion and parallelization, explicit thread-level parallelism through OpenMP annotations, false variable polymorphism pruning, and automatic vector instruction generation such as AVX or SSE. In addition to these compilation steps, Pythran provides a C++ runtime library that leverages the C++ STL to provide generic containers, and the Numeric Template Toolbox for Numpy support. It takes advantage of modern C++11 features such as variadic templates, type inference, move semantics and perfect forwarding, as well as classical idioms such as expression templates. Unlike the Cython approach, Pythran input code remains compatible with the Python interpreter. Output code is generally as efficient as the annotated Cython equivalent, if not more, but without the backward compatibility loss.
Model-Based Systems Engineering in the Execution of Search and Rescue Operations

DTIC Science & Technology

2015-09-01

OSC can fulfill the duties of an ACO but it may make sense to split the duties if there are no communication links between the OSC and participating...parallel mode. This mode is the most powerful option because it 35 creates sequence diagrams that generate parallel “ swim lanes” for each asset...greater flexibility is desired, sequence mode generates diagrams based purely on sequential action and activity diagrams without the parallel “ swim lanes
pureS2HAT: S 2HAT-based Pure E/B Harmonic Transforms

NASA Astrophysics Data System (ADS)

Grain, J.; Stompor, R.; Tristram, M.

2011-10-01

The pS2HAT routines allow efficient, parallel calculation of the so-called 'pure' polarized multipoles. The computed multipole coefficients are equal to the standard pseudo-multipoles calculated for the apodized sky maps of the Stokes parameters Q and U subsequently corrected by so-called counterterms. If the applied apodizations fullfill certain boundary conditions, these multipoles correspond to the pure multipoles. Pure multipoles of one type, i.e., either E or B, are ensured not to contain contributions from the other one, at least to within numerical artifacts. They can be therefore further used in the estimation of the sky power spectra via the pseudo power spectrum technique, which has to however correctly account for the applied apodization on the one hand, and the presence of the counterterms, on the other. In addition, the package contains the routines permitting calculation of the spin-weighted apodizations, given an input scalar, i.e., spin-0 window. The former are needed to compute the counterterms. It also provides routines for maps and window manipulations. The routines are written in C and based on the S2HAT library, which is used to perform all required spherical harmonic transforms as well as all inter-processor communication. They are therefore parallelized using MPI and follow the distributed-memory computational model. The data distribution patterns, pixelization choices, conventions etc are all as those assumed/allowed by the S2HAT library.
Kip, Version 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Staley, Martin

2017-09-20

This high-performance ray tracing library provides very fast rendering; compact code; type flexibility through C++ "generic programming" techniques; and ease of use via an application programming interface (API) that operates independently of any GUI, on-screen display, or other enclosing application. Kip supports constructive solid geometry (CSG) models based on a wide variety of built-in shapes and logical operators, and also allows for user-defined shapes and operators to be provided. Additional features include basic texturing; input/output of models using a simple human-readable file format and with full error checking and detailed diagnostics; and support for shared data parallelism. Kip is writtenmore » in pure, ANSI standard C++; is entirely platform independent; and is very easy to use. As a C++ "header only" library, it requires no build system, configuration or installation scripts, wizards, non-C++ preprocessing, makefiles, shell scripts, or external libraries.« less
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
A Conforming Multigrid Method for the Pure Traction Problem of Linear Elasticity: Mixed Formulation

NASA Technical Reports Server (NTRS)

Lee, Chang-Ock

1996-01-01

A multigrid method using conforming P-1 finite element is developed for the two-dimensional pure traction boundary value problem of linear elasticity. The convergence is uniform even as the material becomes nearly incompressible. A heuristic argument for acceleration of the multigrid method is discussed as well. Numerical results with and without this acceleration as well as performance estimates on a parallel computer are included.
Ordinary mode instability associated with thermal ring distribution

NASA Astrophysics Data System (ADS)

Hadi, F.; Yoon, P. H.; Qamar, A.

2015-02-01

The purely growing ordinary (O) mode instability driven by excessive parallel temperature anisotropy has recently received renewed attention owing to its potential applicability to the solar wind plasma. Previous studies of O mode instability have assumed either bi-Maxwellian or counter-streaming velocity distributions. For solar wind plasma trapped in magnetic mirror-like geometry such as magnetic clouds or in the vicinity of the Earth's collisionless bow shock environment, however, the velocity distribution function may possess a loss-cone feature. The O-mode instability in such a case may be excited for cyclotron harmonics as well as the purely growing branch. The present paper investigates the O-mode instability for plasmas characterized by the parallel Maxwellian distribution and perpendicular thermal ring velocity distribution in order to understand the general stability characteristics.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Alignment between Protostellar Outflows and Filamentary Structure

NASA Astrophysics Data System (ADS)

Stephens, Ian W.; Dunham, Michael M.; Myers, Philip C.; Pokhrel, Riwaj; Sadavoy, Sarah I.; Vorobyov, Eduard I.; Tobin, John J.; Pineda, Jaime E.; Offner, Stella S. R.; Lee, Katherine I.; Kristensen, Lars E.; Jørgensen, Jes K.; Goodman, Alyssa A.; Bourke, Tyler L.; Arce, Héctor G.; Plunkett, Adele L.

2017-09-01

We present new Submillimeter Array (SMA) observations of CO(2-1) outflows toward young, embedded protostars in the Perseus molecular cloud as part of the Mass Assembly of Stellar Systems and their Evolution with the SMA (MASSES) survey. For 57 Perseus protostars, we characterize the orientation of the outflow angles and compare them with the orientation of the local filaments as derived from Herschel observations. We find that the relative angles between outflows and filaments are inconsistent with purely parallel or purely perpendicular distributions. Instead, the observed distribution of outflow-filament angles are more consistent with either randomly aligned angles or a mix of projected parallel and perpendicular angles. A mix of parallel and perpendicular angles requires perpendicular alignment to be more common by a factor of ˜3. Our results show that the observed distributions probably hold regardless of the protostar’s multiplicity, age, or the host core’s opacity. These observations indicate that the angular momentum axis of a protostar may be independent of the large-scale structure. We discuss the significance of independent protostellar rotation axes in the general picture of filament-based star formation.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Comparative in vitro biocompatibility of nickel-titanium, pure nickel, pure titanium, and stainless steel: genotoxicity and atomic absorption evaluation.

PubMed

Assad, M; Lemieux, N; Rivard, C H; Yahia, L H

1999-01-01

The genotoxicity level of nickel-titanium (NiTi) was compared to that of its pure constituents, pure nickel (Ni) and pure titanium (Ti) powders, and also to 316L stainless steel (316L SS) as clinical reference material. In order to do so, a dynamic in vitro semiphysiological extraction was performed with all metals using agitation and ISO requirements. Peripheral blood lymphocytes were then cultured in the presence of all material extracts, and their comparative genotoxicity levels were assessed using electron microscopy-in situ end-labeling (EM-ISEL) coupled to immunogold staining. Cellular chromatin exposition to pure Ni and 316L SS demonstrated a significantly stronger gold binding than exposition to NiTi, pure Ti, or the untreated control. In parallel, graphite furnace atomic absorption spectrophotometry (AAS) was also performed on all extraction media. The release of Ni atoms took the following decreasing distribution for the different resulting semiphysiological solutions: pure Ni, 316L SS, NiTi, Ti, and controls. Ti elements were detected after elution of pure titanium only. Both pure titanium and nickel-titanium specimens obtained a relative in vitro biocompatibility. Therefore, this quantitative in vitro study provides optimistic results for the eventual use of nickel-titanium alloys as surgical implant materials.
Bilingual parallel programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Overbeek, R.

1990-01-01

Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less

A model for cytoplasmic rheology consistent with magnetic twisting cytometry.

PubMed

Butler, J P; Kelly, S M

1998-01-01

Magnetic twisting cytometry is gaining wide applicability as a tool for the investigation of the rheological properties of cells and the mechanical properties of receptor-cytoskeletal interactions. Current technology involves the application and release of magnetically induced torques on small magnetic particles bound to or inside cells, with measurements of the resulting angular rotation of the particles. The properties of purely elastic or purely viscous materials can be determined by the angular strain and strain rate, respectively. However, the cytoskeleton and its linkage to cell surface receptors display elastic, viscous, and even plastic deformation, and the simultaneous characterization of these properties using only elastic or viscous models is internally inconsistent. Data interpretation is complicated by the fact that in current technology, the applied torques are not constant in time, but decrease as the particles rotate. This paper describes an internally consistent model consisting of a parallel viscoelastic element in series with a parallel viscoelastic element, and one approach to quantitative parameter evaluation. The unified model reproduces all essential features seen in data obtained from a wide variety of cell populations, and contains the pure elastic, viscoelastic, and viscous cases as subsets.
Spin-independent transparency of pure spin current at normal/ferromagnetic metal interface

NASA Astrophysics Data System (ADS)

Hao, Runrun; Zhong, Hai; Kang, Yun; Tian, Yufei; Yan, Shishen; Liu, Guolei; Han, Guangbing; Yu, Shuyun; Mei, Liangmo; Kang, Shishou

2018-03-01

The spin transparency at the normal/ferromagnetic metal (NM/FM) interface was studied in Pt/YIG/Cu/FM multilayers. The spin current generated by the spin Hall effect (SHE) in Pt flows into Cu/FM due to magnetic insulator YIG blocking charge current and transmitting spin current via the magnon current. Therefore, the nonlocal voltage induced by an inverse spin Hall effect (ISHE) in FM can be detected. With the magnetization of FM parallel or antiparallel to the spin polarization of pure spin currents ({{\\boldsymbol{σ }}}sc}), the spin-independent nonlocal voltage is induced. This indicates that the spin transparency at the Cu/FM interface is spin-independent, which demonstrates that the influence of spin-dependent electrochemical potential due to spin accumulation on the interfacial spin transparency is negligible. Furthermore, a larger spin Hall angle of Fe20Ni80 (Py) than that of Ni is obtained from the nonlocal voltage measurements. Project supported by the National Basic Research Program of China (Grant No. 2015CB921502), the National Natural Science Foundation of China (Grant Nos. 11474184 and 11627805), the 111 Project, China (Grant No. B13029), and the Fundamental Research Funds of Shandong University, China.
Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Multiscale Monte Carlo equilibration: Pure Yang-Mills theory

DOE PAGES

Endres, Michael G.; Brower, Richard C.; Orginos, Kostas; ...

2015-12-29

In this study, we present a multiscale thermalization algorithm for lattice gauge theory, which enables efficient parallel generation of uncorrelated gauge field configurations. The algorithm combines standard Monte Carlo techniques with ideas drawn from real space renormalization group and multigrid methods. We demonstrate the viability of the algorithm for pure Yang-Mills gauge theory for both heat bath and hybrid Monte Carlo evolution, and show that it ameliorates the problem of topological freezing up to controllable lattice spacing artifacts.
VizieR Online Data Catalog: Milky Way L/T/M-dwarfs identified in BoRG survey (Holwerda+, 2014)

NASA Astrophysics Data System (ADS)

Holwerda, B. W.; Trenti, M.; Clarkson, W.; Sahu, K.; Bradley, L.; Stiavelli, M.; Pirzkal, N.; de Marchi, G.; Andersen, M.; Bouwens, R.; Ryan, R.

2017-07-01

Our principal data set is the WFC3 data from the BoRG (HST GO/PAR-11700; Trenti et al. 2011ApJ...727L..39T; Bradley et al. 2012ApJ...760..108B) survey to identify Milky Way dwarf stars from their morphology and color. The BoRG observations are undithered HST/WFC3 conducted in pure-parallel with the telescope pointing to a primary spectroscopic target with the Cosmic Origin Spectrograph (COS; typically a high-z QSO at high Galactic latitude). The limitations for such observations are primarily that no dithering strategy can be used (final images are at WFC3 native pixel scale) and total exposure times are dictated by the primary program. (5 data files).
A purely Lagrangian method for simulating the shallow water equations on a sphere using smooth particle hydrodynamics

NASA Astrophysics Data System (ADS)

Capecelatro, Jesse

2018-03-01

It has long been suggested that a purely Lagrangian solution to global-scale atmospheric/oceanic flows can potentially outperform tradition Eulerian schemes. Meanwhile, a demonstration of a scalable and practical framework remains elusive. Motivated by recent progress in particle-based methods when applied to convection dominated flows, this work presents a fully Lagrangian method for solving the inviscid shallow water equations on a rotating sphere in a smooth particle hydrodynamics framework. To avoid singularities at the poles, the governing equations are solved in Cartesian coordinates, augmented with a Lagrange multiplier to ensure that fluid particles are constrained to the surface of the sphere. An underlying grid in spherical coordinates is used to facilitate efficient neighbor detection and parallelization. The method is applied to a suite of canonical test cases, and conservation, accuracy, and parallel performance are assessed.
Alignment between Protostellar Outflows and Filamentary Structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stephens, Ian W.; Dunham, Michael M.; Myers, Philip C.

2017-09-01

We present new Submillimeter Array (SMA) observations of CO(2–1) outflows toward young, embedded protostars in the Perseus molecular cloud as part of the Mass Assembly of Stellar Systems and their Evolution with the SMA (MASSES) survey. For 57 Perseus protostars, we characterize the orientation of the outflow angles and compare them with the orientation of the local filaments as derived from Herschel observations. We find that the relative angles between outflows and filaments are inconsistent with purely parallel or purely perpendicular distributions. Instead, the observed distribution of outflow-filament angles are more consistent with either randomly aligned angles or a mixmore » of projected parallel and perpendicular angles. A mix of parallel and perpendicular angles requires perpendicular alignment to be more common by a factor of ∼3. Our results show that the observed distributions probably hold regardless of the protostar’s multiplicity, age, or the host core’s opacity. These observations indicate that the angular momentum axis of a protostar may be independent of the large-scale structure. We discuss the significance of independent protostellar rotation axes in the general picture of filament-based star formation.« less
Menu-Driven Solver Of Linear-Programming Problems

NASA Technical Reports Server (NTRS)

Viterna, L. A.; Ferencz, D.

1992-01-01

Program assists inexperienced user in formulating linear-programming problems. A Linear Program Solver (ALPS) computer program is full-featured LP analysis program. Solves plain linear-programming problems as well as more-complicated mixed-integer and pure-integer programs. Also contains efficient technique for solution of purely binary linear-programming problems. Written entirely in IBM's APL2/PC software, Version 1.01. Packed program contains licensed material, property of IBM (copyright 1988, all rights reserved).
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Physical-Chemical Characterization of Fruit Purees and Relationship with Sensory Analysis Carried out by Infants (12 to 24 mo).

PubMed

Inarejos-García, A M; Mancebo-Campos, V; Cañizares, P; Llanos, J

2015-05-01

Fruit purees are one of the foods earliest introduced foods in infants' diet during the complementary period. The rheological characteristics together with the sensory analysis are decisive factors for the acceptance of the food product by the infant. The sensory analysis of three commercial fruit purees (mixed fruits, pear, and plum) was studied by employing a new objective sensory parameter named as SAIR (Sensory Acceptance by Infants Ratio), which is the quotient between the percentage of puree consumed (%) by the time (seconds) throughout the storage time. In parallel, the rheological characteristics of the purees were analyzed in order to obtain a relationship with the SAIR parameter. It was proved that the best acceptance of the product (higher SAIR) was observed for such purees showing a lower apparent viscosity (lower consistency index, "K") and a less pseudoplastic behavior (higher flow behavior index, "n"). These results may help to obtain higher acceptance values based on easy obtainable and objective parameters. © 2015 Institute of Food Technologists®
Predicting stability limits for pure and doped dicationic noble gas clusters undergoing coulomb explosion: A parallel tempering based study.

PubMed

Ghorai, Sankar; Chaudhury, Pinaki

2018-05-30

We have used a replica exchange Monte-Carlo procedure, popularly known as Parallel Tempering, to study the problem of Coulomb explosion in homogeneous Ar and Xe dicationic clusters as well as mixed Ar-Xe dicationic clusters of varying sizes with different degrees of relative composition. All the clusters studied have two units of positive charges. The simulations reveal that in all the cases there is a cutoff size below which the clusters fragment. It is seen that for the case of pure Ar, the value is around 95 while that for Xe it is 55. For the mixed clusters with increasing Xe content, the cutoff limit for suppression of Coulomb explosion gradually decreases from 95 for a pure Ar to 55 for a pure Xe cluster. The hallmark of this study is this smooth progression. All the clusters are simulated using the reliable potential energy surface developed by Gay and Berne (Gay and Berne, Phys. Rev. Lett. 1982, 49, 194). For the hetero clusters, we have also discussed two different ways of charge distribution, that is one in which both positive charges are on two Xe atoms and the other where the two charges are at a Xe atom and at an Ar atom. The fragmentation patterns observed by us are such that single ionic ejections are the favored dissociating pattern. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Displacement and deformation measurement for large structures by camera network

NASA Astrophysics Data System (ADS)

Shang, Yang; Yu, Qifeng; Yang, Zhen; Xu, Zhiqiang; Zhang, Xiaohu

2014-03-01

A displacement and deformation measurement method for large structures by a series-parallel connection camera network is presented. By taking the dynamic monitoring of a large-scale crane in lifting operation as an example, a series-parallel connection camera network is designed, and the displacement and deformation measurement method by using this series-parallel connection camera network is studied. The movement range of the crane body is small, and that of the crane arm is large. The displacement of the crane body, the displacement of the crane arm relative to the body and the deformation of the arm are measured. Compared with a pure series or parallel connection camera network, the designed series-parallel connection camera network can be used to measure not only the movement and displacement of a large structure but also the relative movement and deformation of some interesting parts of the large structure by a relatively simple optical measurement system.
An interactive parallel programming environment applied in atmospheric science

NASA Technical Reports Server (NTRS)

vonLaszewski, G.

1996-01-01

This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Thermal conductivity and thermal expansion of graphite fiber/copper matrix composites

NASA Technical Reports Server (NTRS)

Ellis, David L.; Mcdanels, David L.

1991-01-01

The high specific conductivity of graphite fiber/copper matrix (Gr/Cu) composites offers great potential for high heat flux structures operating at elevated temperatures. To determine the feasibility of applying Gr/Cu composites to high heat flux structures, composite plates were fabricated using unidirectional and cross-plied pitch-based P100 graphite fibers in a pure copper matrix. Thermal conductivity of the composites was measured from room temperature to 1073 K, and thermal expansion was measured from room temperature to 1050 K. The longitudinal thermal conductivity, parallel to the fiber direction, was comparable to pure copper. The transverse thermal conductivity, normal to the fiber direction, was less than that of pure copper and decreased with increasing fiber content. The longitudinal thermal expansion decreased with increasing fiber content. The transverse thermal expansion was greater than pure copper and nearly independent of fiber content.
Thermal conductivity and thermal expansion of graphite fiber-reinforced copper matrix composites

NASA Technical Reports Server (NTRS)

Ellis, David L.; Mcdanels, David L.

1993-01-01

The high specific conductivity of graphite fiber/copper matrix (Gr/Cu) composites offers great potential for high heat flux structures operating at elevated temperatures. To determine the feasibility of applying Gr/Cu composites to high heat flux structures, composite plates were fabricated using unidirectional and cross-plied pitch-based P100 graphite fibers in a pure copper matrix. Thermal conductivity of the composites was measured from room temperature to 1073 K, and thermal expansion was measured from room temperature to 1050 K. The longitudinal thermal conductivity, parallel to the fiber direction, was comparable to pure copper. The transverse thermal conductivity, normal to the fiber direction, was less than that of pure copper and decreased with increasing fiber content. The longitudinal thermal expansion decreased with increasing fiber content. The transverse thermal expansion was greater than pure copper and nearly independent of fiber content.
DFT calculations of graphene monolayer in presence of Fe dopant and vacancy

NASA Astrophysics Data System (ADS)

Ostovari, Fatemeh; Hasanpoori, Marziyeh; Abbasnejad, Mohaddeseh; Salehi, Mohammad Ali

2018-07-01

In the present work, the effects of Fe doping and vacancies on the electronic, magnetic and optical properties of graphene are studied by density functional theory based calculations. The conductive behavior is revealed for the various defected graphene by means of electronic density of states. However, defected structures show different magnetic and optical properties compared to those of pure one. The ferromagnetic phase is the most probable phase by substituting Fe atoms and vacancies at AA sublattice of graphene. The optical properties of impure graphene differ from pure graphene under illumination with parallel polarization of electric field, whereas for perpendicular polarization it remains unchanged. In presence of defect and under parallel polarization of light, the static dielectric constant rises strongly and the maximum peak of Im ε(ω) shows red shift relative to pure graphene. Moreover, the maximum absorption peak gets broaden in the visible to infrared region at the same condition and the magnitude and related energy of peaks shift to higher value in the EELS spectra. Furthermore, the results show that the maximum values of refractive index and reflectivity spectra increase rapidly and represent the red and blue shifts; respectively. Generally; substituting the C atom with Fe has more effect on magnetic and optical properties relative to the C vacancies.
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Quantum lattice model solver HΦ

NASA Astrophysics Data System (ADS)

Kawamura, Mitsuaki; Yoshimi, Kazuyoshi; Misawa, Takahiro; Yamaji, Youhei; Todo, Synge; Kawashima, Naoki

2017-08-01

HΦ [aitch-phi ] is a program package based on the Lanczos-type eigenvalue solution applicable to a broad range of quantum lattice models, i.e., arbitrary quantum lattice models with two-body interactions, including the Heisenberg model, the Kitaev model, the Hubbard model and the Kondo-lattice model. While it works well on PCs and PC-clusters, HΦ also runs efficiently on massively parallel computers, which considerably extends the tractable range of the system size. In addition, unlike most existing packages, HΦ supports finite-temperature calculations through the method of thermal pure quantum (TPQ) states. In this paper, we explain theoretical background and user-interface of HΦ. We also show the benchmark results of HΦ on supercomputers such as the K computer at RIKEN Advanced Institute for Computational Science (AICS) and SGI ICE XA (Sekirei) at the Institute for the Solid State Physics (ISSP).

The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

NASA Astrophysics Data System (ADS)

Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
The BLAZE language: A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
The State of Software

ERIC Educational Resources Information Center

Day, A. C.

1975-01-01

ALLC members are divided here into pure linguists, pure programmers, and linguist programmers. Five computer languages and the use of packages and coders are discussed briefly. It is suggested that the pure programmers are best able to help the pure linguists with their programming problems. (RM)
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Convergence issues in domain decomposition parallel computation of hovering rotor

NASA Astrophysics Data System (ADS)

Xiao, Zhongyun; Liu, Gang; Mou, Bin; Jiang, Xiong

2018-05-01

Implicit LU-SGS time integration algorithm has been widely used in parallel computation in spite of its lack of information from adjacent domains. When applied to parallel computation of hovering rotor flows in a rotating frame, it brings about convergence issues. To remedy the problem, three LU factorization-based implicit schemes (consisting of LU-SGS, DP-LUR and HLU-SGS) are investigated comparatively. A test case of pure grid rotation is designed to verify these algorithms, which show that LU-SGS algorithm introduces errors on boundary cells. When partition boundaries are circumferential, errors arise in proportion to grid speed, accumulating along with the rotation, and leading to computational failure in the end. Meanwhile, DP-LUR and HLU-SGS methods show good convergence owing to boundary treatment which are desirable in domain decomposition parallel computations.
An integrated control strategy for the composite braking system of an electric vehicle with independently driven axles

NASA Astrophysics Data System (ADS)

Sun, Fengchun; Liu, Wei; He, Hongwen; Guo, Hongqiang

2016-08-01

For an electric vehicle with independently driven axles, an integrated braking control strategy was proposed to coordinate the regenerative braking and the hydraulic braking. The integrated strategy includes three modes, namely the hybrid composite mode, the parallel composite mode and the pure hydraulic mode. For the hybrid composite mode and the parallel composite mode, the coefficients of distributing the braking force between the hydraulic braking and the two motors' regenerative braking were optimised offline, and the response surfaces related to the driving state parameters were established. Meanwhile, the six-sigma method was applied to deal with the uncertainty problems for reliability. Additionally, the pure hydraulic mode is activated to ensure the braking safety and stability when the predictive failure of the response surfaces occurs. Experimental results under given braking conditions showed that the braking requirements could be well met with high braking stability and energy regeneration rate, and the reliability of the braking strategy was guaranteed on general braking conditions.
Spherical Harmonic Solutions to the 3D Kobayashi Benchmark Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, P.N.; Chang, B.; Hanebutte, U.R.

1999-12-29

Spherical harmonic solutions of order 5, 9 and 21 on spatial grids containing up to 3.3 million cells are presented for the Kobayashi benchmark suite. This suite of three problems with simple geometry of pure absorber with large void region was proposed by Professor Kobayashi at an OECD/NEA meeting in 1996. Each of the three problems contains a source, a void and a shield region. Problem 1 can best be described as a box in a box problem, where a source region is surrounded by a square void region which itself is embedded in a square shield region. Problems 2more » and 3 represent a shield with a void duct. Problem 2 having a straight and problem 3 a dog leg shaped duct. A pure absorber and a 50% scattering case are considered for each of the three problems. The solutions have been obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The Ardra code takes advantage of a two-level parallelization strategy, which combines message passing between processing nodes and thread based parallelism amongst processors on each node. All calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.« less
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
Pure animal phobia is more specific than other specific phobias: epidemiological evidence from the Zurich Study, the ZInEP and the PsyCoLaus.

PubMed

Ajdacic-Gross, Vladeta; Rodgers, Stephanie; Müller, Mario; Hengartner, Michael P; Aleksandrowicz, Aleksandra; Kawohl, Wolfram; Heekeren, Karsten; Rössler, Wulf; Angst, Jules; Castelao, Enrique; Vandeleur, Caroline; Preisig, Martin

2016-09-01

Interest in subtypes of mental disorders is growing in parallel with continuing research progress in psychiatry. The aim of this study was to examine pure animal phobia in contrast to other specific phobias and a mixed subtype. Data from three representative Swiss community samples were analysed: PsyCoLaus (n = 3720), the ZInEP Epidemiology Survey (n = 1500) and the Zurich Study (n = 591). Pure animal phobia and mixed animal/other specific phobias consistently displayed a low age at onset of first symptoms (8-12 years) and clear preponderance of females (OR > 3). Meanwhile, other specific phobias started up to 10 years later and displayed almost a balanced sex ratio. Pure animal phobia showed no associations with any included risk factors and comorbid disorders, in contrast to numerous associations found in the mixed subtype and in other specific phobias. Across the whole range of epidemiological parameters examined in three different samples, pure animal phobia seems to represent a different entity compared to other specific phobias. The etiopathogenetic mechanisms and risk factors associated with pure animal phobias appear less clear than ever.
Impedance spectroscopy of reduced monoclinic zirconia.

PubMed

Eder, Dominik; Kramer, Reinhard

2006-10-14

Zirconia doped with low-valent cations (e.g. Y3+ or Ca2+) exhibits an exceptionally high ionic conductivity, making them ideal candidates for various electrochemical applications including solid oxide fuel cells (SOFC) and oxygen sensors. It is nevertheless important to study the undoped, monoclinic ZrO2 as a model system to construct a comprehensive picture of the electrical behaviour. In pure zirconia a residual number of anion vacancies remains because of contaminants in the material as well as the thermodynamic disorder equilibrium, but electronic conduction may also contribute to the observed conductivity. Reduction of zirconia in hydrogen leads to the adsorption of hydrogen and to the formation of oxygen vacancies, with their concentration affected by various parameters (e.g. reduction temperature and time, surface area, and water vapour pressure). However, there is still little known about the reactivities of defect species and their effect on the ionic and electronic conduction. Thus, we applied electrochemical impedance spectroscopy to investigate the electric performance of pure monoclinic zirconia with different surface areas in both oxidizing and reducing atmospheres. A novel equivalent circuit model including parallel ionic and electronic conduction has previously been developed for titania and is used herein to decouple the conduction processes. The concentration of defects and their formation energies were measured using volumetric oxygen titration and temperature programmed oxidation/desorption.
The BLAZE language - A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1989-01-01

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
Methods for design and evaluation of parallel computating systems (The PISCES project)

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

1989-01-01

The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

NASA Astrophysics Data System (ADS)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
New theoretical results for the Lehmann effect in cholesteric liquid crystals

NASA Technical Reports Server (NTRS)

Brand, Helmut R.; Pleiner, Harald

1988-01-01

The Lehmann effect arising in a cholesteric liquid crystal drop when a temperature gradient is applied parallel to its helical axis is investigated theoretically using a local approach. A pseudoscalar quantity is introduced to allow for cross couplings which are absent in nematic liquid crystals, and the statics and dissipative dynamics are analyzed in detail. It is shown that the Lehmann effect is purely dynamic for the case of an external electric field and purely static for an external density gradient, but includes both dynamic and static coupling contributions for the cases of external temperature or concentration gradients.

Resonance-induced sensitivity enhancement method for conductivity sensors

NASA Technical Reports Server (NTRS)

Tai, Yu-Chong (Inventor); Shih, Chi-yuan (Inventor); Li, Wei (Inventor); Zheng, Siyang (Inventor)

2009-01-01

Methods and systems for improving the sensitivity of a variety of conductivity sensing devices, in particular capacitively-coupled contactless conductivity detectors. A parallel inductor is added to the conductivity sensor. The sensor with the parallel inductor is operated at a resonant frequency of the equivalent circuit model. At the resonant frequency, parasitic capacitances that are either in series or in parallel with the conductance (and possibly a series resistance) is substantially removed from the equivalent circuit, leaving a purely resistive impedance. An appreciably higher sensor sensitivity results. Experimental verification shows that sensitivity improvements of the order of 10,000-fold are possible. Examples of detecting particulates with high precision by application of the apparatus and methods of operation are described.
Computer-aided programming for message-passing system; Problems and a solution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, M.Y.; Gajski, D.D.

1989-12-01

As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-26

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... technologies to participate in a program of parallel FDA-CMS review. The document was published with an...
Fast word reading in pure alexia: "fast, yet serial".

PubMed

Bormann, Tobias; Wolfer, Sascha; Hachmann, Wibke; Neubauer, Claudia; Konieczny, Lars

2015-01-01

Pure alexia is a severe impairment of word reading in which individuals process letters serially with a pronounced length effect. Yet, there is considerable variation in the performance of alexic readers with generally very slow, but also occasionally fast responses, an observation addressed rarely in previous reports. It has been suggested that "fast" responses in pure alexia reflect residual parallel letter processing or that they may even be subserved by an independent reading system. Four experiments assessed fast and slow reading in a participant (DN) with pure alexia. Two behavioral experiments investigated frequency, neighborhood, and length effects in forced fast reading. Two further experiments measured eye movements when DN was forced to read quickly, or could respond faster because words were easier to process. Taken together, there was little support for the proposal that "qualitatively different" mechanisms or reading strategies underlie both types of responses in DN. Instead, fast responses are argued to be generated by the same serial-reading strategy.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lockyer, Nigel S.; Smith, AJ Stewart,; et. al.

In 2004 a team from the University of Pennsylvania, Princeton University, and the Institute for Advanced Study proposed to host the 2008 International Conference on High Energy Physics (ICHEP) on the campus of the University of Pennsylvania in Philadelphia. The proposal was approved later that year by the C-11 committee of the International Union of Pure and Applied Physics. The Co-Chairs were Nigel S. Lockyer (U. Penn/TRIUMF) and A.J. Stewart Smith (Princeton); Joe Kroll of U. Penn served as Deputy Chair from 2007 on. Highlights of the proposal included 1. greatly increased participation of young scientists, women scientists, and graduatemore » students 2. new emphasis on formal theory 3. increased focus on astrophysics and cosmology 4. large informal poster session (170 posters) in prime time 5. convenient, contiguous venues for all sessions and lodging 6. landmark locations for the reception and banquet. The conference program consisted of three days of parallel sessions and three days of plenary talks.« less
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Using CLIPS in the domain of knowledge-based massively parallel programming

NASA Technical Reports Server (NTRS)

Dvorak, Jiri J.

1994-01-01

The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Programming parallel architectures: The BLAZE family of languages

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
High-Performance Design Patterns for Modern Fortran

DOE PAGES

Haveraaen, Magne; Morris, Karla; Rouson, Damian; ...

2015-01-01

This paper presents ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain. We also show how Fortran, over the last few decades, has changed to become a language well-suited for state-of-the-art software development. Fortran’s new coarray distributed data structure, the language’s class mechanism, and its side-effect-free, pure procedure capability provide the scaffolding on which we implement HPC software. These features empower compilers to organize parallel computations with efficient communication. We present some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data. We implemented thesemore » patterns using coarrays and the message passing interface (MPI). We compared the codes’ complexity and performance. The MPI code is much more complex and depends on external libraries. The MPI code on Cray hardware using the Cray compiler is 1.5–2 times faster than the coarray code on the same hardware. The Intel compiler implements coarrays atop Intel’s MPI library with the result apparently being 2–2.5 times slower than manually coded MPI despite exhibiting nearly linear scaling efficiency. As compilers mature and further improvements to coarrays comes in Fortran 2015, we expect this performance gap to narrow.« less
Evolution of sausage and helical modes in magnetized thin-foil cylindrical liners driven by a Z-pinch

NASA Astrophysics Data System (ADS)

Yager-Elorriaga, D. A.; Lau, Y. Y.; Zhang, P.; Campbell, P. C.; Steiner, A. M.; Jordan, N. M.; McBride, R. D.; Gilgenbach, R. M.

2018-05-01

In this paper, we present experimental results on axially magnetized (Bz = 0.5 - 2.0 T), thin-foil (400 nm-thick) cylindrical liner-plasmas driven with ˜600 kA by the Michigan Accelerator for Inductive Z-Pinch Experiments, which is a linear transformer driver at the University of Michigan. We show that: (1) the applied axial magnetic field, irrespective of its direction (e.g., parallel or anti-parallel to the flow of current), reduces the instability amplitude for pure magnetohydrodynamic (MHD) modes [defined as modes devoid of the acceleration-driven magneto-Rayleigh-Taylor (MRT) instability]; (2) axially magnetized, imploding liners (where MHD modes couple to MRT) generate m = 1 or m = 2 helical modes that persist from the implosion to the subsequent explosion stage; (3) the merging of instability structures is a mechanism that enables the appearance of an exponential instability growth rate for a longer than expected time-period; and (4) an inverse cascade in both the axial and azimuthal wavenumbers, k and m, may be responsible for the final m = 2 helical structure observed in our experiments. These experiments are particularly relevant to the magnetized liner inertial fusion program pursued at Sandia National Laboratories, where helical instabilities have been observed.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Pure-tone Audiometer

NASA Astrophysics Data System (ADS)

Kapul, A. A.; Zubova, E. I.; Torgaev, S. N.; Drobchik, V. V.

2017-08-01

The research focuses on a pure-tone audiometer designing. The relevance of the study is proved by high incidence of an auditory analyser in older people and children. At first, the article provides information about subjective and objective audiometry methods. Secondly, we offer block-diagram and basic-circuit arrangement of device. We decided to base on STM32F407VG microcontroller and use digital pot in the function of attenuator. Third, we implemented microcontroller and PC connection. C programming language is used for microcontroller’s program and PC’s interface. Fourthly, we created the pure-tone audiometer prototype. In the future, we will implement the objective method ASSR in addition to pure-tone audiometry.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Microstructure and crystallographic texture of pure titanium parts generated by laser additive manufacturing

NASA Astrophysics Data System (ADS)

Arias-González, Felipe; del Val, Jesús; Comesaña, Rafael; Penide, Joaquín; Lusquiños, Fernando; Quintero, Félix; Riveiro, Antonio; Boutinguiza, Mohamed; Gil, Francisco Javier; Pou, Juan

2018-01-01

In this paper, the microstructure and crystallographic texture of pure Ti thin walls generated by Additive Manufacturing based on Laser Cladding (AMLC) are analyzed in depth. From the results obtained, it is possible to better understand the AMLC process of pure titanium. The microstructure observed in the samples consists of large elongated columnar prior β grains which have grown epitaxially from the substrate to the top, in parallel to the building direction. Within the prior β grains, α-Ti lamellae and lamellar colonies are the result of cooling from above the β-transus temperature. This transformation follows the Burgers relationship and the result is a basket-weave microstructure with a strong crystallographic texture. Finally, a thermal treatment is proposed to transform the microstructure of the as-deposited samples into an equiaxed microstructure of α-Ti grains.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Experiences with hypercube operating system instrumentation

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Rudolph, David C.

1989-01-01

The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

PubMed

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (<100 ms) moved to the center only one saccade later. Our data on eye movement positions provide novel evidence of parallel programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
Dietary Levels of Pure Flavonoids Improve Spatial Memory Performance and Increase Hippocampal Brain-Derived Neurotrophic Factor

PubMed Central

Rendeiro, Catarina; Vauzour, David; Rattray, Marcus; Waffo-Téguo, Pierre; Mérillon, Jean Michel; Butler, Laurie T.; Williams, Claire M.; Spencer, Jeremy P. E.

2013-01-01

Evidence suggests that flavonoid-rich foods are capable of inducing improvements in memory and cognition in animals and humans. However, there is a lack of clarity concerning whether flavonoids are the causal agents in inducing such behavioral responses. Here we show that supplementation with pure anthocyanins or pure flavanols for 6 weeks, at levels similar to that found in blueberry (2% w/w), results in an enhancement of spatial memory in 18 month old rats. Pure flavanols and pure anthocyanins were observed to induce significant improvements in spatial working memory (p = 0.002 and p = 0.006 respectively), to a similar extent to that following blueberry supplementation (p = 0.002). These behavioral changes were paralleled by increases in hippocampal brain-derived neurotrophic factor (R = 0.46, p<0.01), suggesting a common mechanism for the enhancement of memory. However, unlike protein levels of BDNF, the regional enhancement of BDNF mRNA expression in the hippocampus appeared to be predominantly enhanced by anthocyanins. Our data support the claim that flavonoids are likely causal agents in mediating the cognitive effects of flavonoid-rich foods. PMID:23723987
smoothG

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barker, Andrew T.; Gelever, Stephan A.; Lee, Chak S.

2017-12-12

smoothG is a collection of parallel C++ classes/functions that algebraically constructs reduced models of different resolutions from a given high-fidelity graph model. In addition, smoothG also provides efficient linear solvers for the reduced models. Other than pure graph problem, the software finds its application in subsurface flow and power grid simulations in which graph Laplacians are found
Lorentz Contraction and Current-Carrying Wires

ERIC Educational Resources Information Center

van Kampen, Paul

2008-01-01

The force between two parallel current-carrying wires is investigated in the rest frames of the ions and the electrons. A straightforward Lorentz transformation shows that what appears as a purely magnetostatic force in the ion frame appears as a combined magnetostatic and electrostatic force in the electron frame. The derivation makes use of a…
Felix Klein and the NCTM's Standards: A Mathematician Considers Mathematics Education.

ERIC Educational Resources Information Center

McComas, Kim Krusen

2000-01-01

Discusses the parallels between Klein's position at the forefront of a movement to reform mathematics education and that of the National Council of Teachers of Mathematics' (NCTM) Standards. Draws a picture of Klein as an important historical figure who saw equal importance in studying pure mathematics, applying mathematics, and teaching…
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Jian; Hamidouche, Khaled; Zheng, Jie

2015-08-05

Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less
A survey of parallel programming tools

NASA Technical Reports Server (NTRS)

Cheng, Doreen Y.

1991-01-01

This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Initial singularity and pure geometric field theories

NASA Astrophysics Data System (ADS)

Wanas, M. I.; Kamal, Mona M.; Dabash, Tahia F.

2018-01-01

In the present article we use a modified version of the geodesic equation, together with a modified version of the Raychaudhuri equation, to study initial singularities. These modified equations are used to account for the effect of the spin-torsion interaction on the existence of initial singularities in cosmological models. Such models are the results of solutions of the field equations of a class of field theories termed pure geometric. The geometric structure used in this study is an absolute parallelism structure satisfying the cosmological principle. It is shown that the existence of initial singularities is subject to some mathematical (geometric) conditions. The scheme suggested for this study can be easily generalized.
Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

NASA Technical Reports Server (NTRS)

Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.
Transport in a toroidally confined pure electron plasma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crooks, S.M.; ONeil, T.M.

1996-07-01

O{close_quote}Neil and Smith [T.M. O{close_quote}Neil and R.A. Smith, Phys. Plasmas {bold 1}, 8 (1994)] have argued that a pure electron plasma can be confined stably in a toroidal magnetic field configuration. This paper shows that the toroidal curvature of the magnetic field of necessity causes slow cross-field transport. The transport mechanism is similar to magnetic pumping and may be understood by considering a single flux tube of plasma. As the flux tube of plasma undergoes poloidal {ital E}{bold {times}}{ital B} drift rotation about the center of the plasma, the length of the flux tube and the magnetic field strength withinmore » the flux tube oscillate, and this produces corresponding oscillations in {ital T}{sub {parallel}} and {ital T}{sub {perpendicular}}. The collisional relaxation of {ital T}{sub {parallel}} toward {ital T}{sub {perpendicular}} produces a slow dissipation of electrostatic energy into heat and a consequent expansion (cross-field transport) of the plasma. In the limit where the cross section of the plasma is nearly circular the radial particle flux is given by {Gamma}{sub {ital r}}=1/2{nu}{sub {perpendicular},{parallel}}{ital T}({ital r}/{rho}{sub 0}){sup 2}{ital n}/({minus}{ital e}{partial_derivative}{Phi}/{partial_derivative}{ital r}), where {nu}{sub {perpendicular},{parallel}} is the collisional equipartition rate, {rho}{sub 0} is the major radius at the center of the plasma, and {ital r} is the minor radius measured from the center of the plasma. The transport flux is first calculated using this simple physical picture and then is calculated by solving the drift-kinetic Boltzmann equation. This latter calculation is not limited to a plasma with a circular cross section. {copyright} {ital 1996 American Institute of Physics.}« less
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.

PubMed

Stone, John E; Gohara, David; Shi, Guochun

2010-05-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.
Electrically tunable crossed Andreev reflection in a ferromagnet–superconductor–ferromagnet junction on a topological insulator

NASA Astrophysics Data System (ADS)

Zhang, Kunhua; Cheng, Qiang

2018-07-01

We investigate the crossed Andreev reflection in a ferromagnet–superconductor–ferromagnet junction on the surface of a topological insulator, where the magnetizations in the left and right leads are perpendicular to the surface. We find that the nonlocal transport process can be pure crossed Andreev reflection or pure elastic cotunneling, and the switch between the two processes can be controlled electrically. Pure crossed Andreev reflection appears for all bias voltages in the superconducting energy gap, which is independent of the configuration of the magnetizations in the two leads. The spin of the crossed Andreev reflected hole could be parallel to the spin of the incident electron, which is brought by the spin-triplet pairing correlation. The average transmission probability of crossed Andreev reflection can be larger than 90%, so a high efficiency nonlocal splitting of Cooper pairs can be generated, and turned on and off electrically.

Generation of a sub-half-wavelength focal spot with purely transverse spin angular momentum

NASA Astrophysics Data System (ADS)

Hang, Li; Fu, Jian; Yu, Xiaochang; Wang, Ying; Chen, Peifeng

2017-11-01

We theoretically demonstrate that optical focus fields with purely transverse spin angular momentum (SAM) can be obtained when a kind of special incident fields is focused by a high numerical aperture (NA) aplanatic lens (AL). When the incident pupil fields are refracted by an AL, two transverse Cartesian components of the electric fields at the exit pupil plane do not have the same order of sinusoidal or cosinoidal components, resulting in zero longitudinal SAMs of the focal fields. An incident field satisfying above conditions is then proposed. Using the Richard-Wolf vectorial diffraction theory, the energy density and SAM density distributions of the tightly focused beam are calculated and the results clearly validate the proposed theory. In addition, a sub-half-wavelength focal spot with purely transverse SAM can be achieved and a flattop energy density distribution parallel to z-axis can be observed around the maximum energy density point.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Efficient partitioning and assignment on programs for multiprocessor execution

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1993-01-01

The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
A mechanism for efficient debugging of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, B.P.; Choi, J.D.

1988-01-01

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

PubMed Central

Stone, John E.; Gohara, David; Shi, Guochun

2010-01-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981
Pure Air`s Bailly scrubber: A four-year retrospective

DOE Office of Scientific and Technical Information (OSTI.GOV)

Manavi, G.B.; Vymazal, D.C.; Sarkus, T.A.

1997-12-31

Pure Air`s Advanced Flue Gas Desulfurization (AFGD) Clean Coal Project has completed four highly successful years of operation at NIPSCO`s Bailly Station. As part of their program, Pure Air has concluded a six-part study of system performance. This paper summarizes the results of the demonstration program, including AFGD performance on coals ranging from 2.0--2.4% sulfur. The paper highlights novel aspects of the Bailly facility, including pulverized limestone injection, air rotary sparger for oxidation, wastewater evaporation system and the production of PowerChip{reg_sign} gypsum. Operations and maintenance which have led to the facility`s notable 99.47% availability record are also discussed. A projectmore » company, Pure Air on the Lake Limited Partnership, owns the AFGD facility. Pure Air was the turn key contractor and Air Products and Chemicals, Inc. is the operator of the AFGD system.« less
Genetic algorithms using SISAL parallel programming language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tejada, S.

1994-05-06

Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.
An Expert System for the Development of Efficient Parallel Code

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

2004-01-01

We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
The Effective Width of Curved Sheet After Buckling

NASA Technical Reports Server (NTRS)

Wenzek, W A

1938-01-01

This report describes experiments made for the purpose of ascertaining the effective width of circularly curved sheet under pure flexural stress. A relation for the effective width of curved sheets is established. Experiments were made with circular cylinders compressed in longitudinal direction. The sheets were rigidly built in at the sides parallel to the axis of the cylinder.
NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

NASA Astrophysics Data System (ADS)

Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

2014-07-01

We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.
Solving Integer Programs from Dependence and Synchronization Problems

DTIC Science & Technology

1993-03-01

DEFF.NSNE Solving Integer Programs from Dependence and Synchronization Problems Jaspal Subhlok March 1993 CMU-CS-93-130 School of Computer ScienceT IC...method Is an exact and efficient way of solving integer programming problems arising in dependence and synchronization analysis of parallel programs...7/;- p Keywords: Exact dependence tesing, integer programming. parallelilzng compilers, parallel program analysis, synchronization analysis Solving
Observing with HST V: Improvements to the Scheduling of HST Parallel Observations

NASA Astrophysics Data System (ADS)

Taylor, D. K.; Vanorsow, D.; Lucks, M.; Henry, R.; Ratnatunga, K.; Patterson, A.

1994-12-01

Recent improvements to the Hubble Space Telescope (HST) ground system have significantly increased the frequency of pure parallel observations, i.e. the simultaneous use of multiple HST instruments by different observers. Opportunities for parallel observations are limited by a variety of timing, hardware, and scientific constraints. Formerly, such opportunities were heuristically predicted prior to the construction of the primary schedule (or calendar), and lack of complete information resulted in high rates of scheduling failures and missed opportunities. In the current process the search for parallel opportunities is delayed until the primary schedule is complete, at which point new software tools are employed to identify places where parallel observations are supported. The result has been a considerable increase in parallel throughput. A new technique, known as ``parallel crafting,'' is currently under development to streamline further the parallel scheduling process. This radically new method will replace the standard exposure logsheet with a set of abstract rules from which observation parameters will be constructed ``on the fly'' to best match the constraints of the parallel opportunity. Currently, parallel observers must specify a huge (and highly redundant) set of exposure types in order to cover all possible types of parallel opportunities. Crafting rules permit the observer to express timing, filter, and splitting preferences in a far more succinct manner. The issue of coordinated parallel observations (same PI using different instruments simultaneously), long a troublesome aspect of the ground system, is also being addressed. For Cycle 5, the Phase II Proposal Instructions now have an exposure-level PAR WITH special requirement. While only the primary's alignment will be scheduled on the calendar, new commanding will provide for parallel exposures with both instruments.
NIST Gas Hydrate Research Database and Web Dissemination Channel.

PubMed

Kroenlein, K; Muzny, C D; Kazakov, A; Diky, V V; Chirico, R D; Frenkel, M; Sloan, E D

2010-01-01

To facilitate advances in application of technologies pertaining to gas hydrates, a freely available data resource containing experimentally derived information about those materials was developed. This work was performed by the Thermodynamic Research Center (TRC) paralleling a highly successful database of thermodynamic and transport properties of molecular pure compounds and their mixtures. Population of the gas-hydrates database required development of guided data capture (GDC) software designed to convert experimental data and metadata into a well organized electronic format, as well as a relational database schema to accommodate all types of numerical and metadata within the scope of the project. To guarantee utility for the broad gas hydrate research community, TRC worked closely with the Committee on Data for Science and Technology (CODATA) task group for Data on Natural Gas Hydrates, an international data sharing effort, in developing a gas hydrate markup language (GHML). The fruits of these efforts are disseminated through the NIST Sandard Reference Data Program [1] as the Clathrate Hydrate Physical Property Database (SRD #156). A web-based interface for this database, as well as scientific results from the Mallik 2002 Gas Hydrate Production Research Well Program [2], is deployed at http://gashydrates.nist.gov.
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
OPTICAL correlation identification technology applied in underwater laser imaging target identification

NASA Astrophysics Data System (ADS)

Yao, Guang-tao; Zhang, Xiao-hui; Ge, Wei-long

2012-01-01

The underwater laser imaging detection is an effective method of detecting short distance target underwater as an important complement of sonar detection. With the development of underwater laser imaging technology and underwater vehicle technology, the underwater automatic target identification has gotten more and more attention, and is a research difficulty in the area of underwater optical imaging information processing. Today, underwater automatic target identification based on optical imaging is usually realized with the method of digital circuit software programming. The algorithm realization and control of this method is very flexible. However, the optical imaging information is 2D image even 3D image, the amount of imaging processing information is abundant, so the electronic hardware with pure digital algorithm will need long identification time and is hard to meet the demands of real-time identification. If adopt computer parallel processing, the identification speed can be improved, but it will increase complexity, size and power consumption. This paper attempts to apply optical correlation identification technology to realize underwater automatic target identification. The optics correlation identification technology utilizes the Fourier transform characteristic of Fourier lens which can accomplish Fourier transform of image information in the level of nanosecond, and optical space interconnection calculation has the features of parallel, high speed, large capacity and high resolution, combines the flexibility of calculation and control of digital circuit method to realize optoelectronic hybrid identification mode. We reduce theoretical formulation of correlation identification and analyze the principle of optical correlation identification, and write MATLAB simulation program. We adopt single frame image obtained in underwater range gating laser imaging to identify, and through identifying and locating the different positions of target, we can improve the speed and orientation efficiency of target identification effectively, and validate the feasibility of this method primarily.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

DOE PAGES

Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

2013-01-01

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

Distributed and parallel Ada and the Ada 9X recommendations

NASA Technical Reports Server (NTRS)

Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.

1992-01-01

Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.
Spherical roller bearing analysis. SKF computer program SPHERBEAN. Volume 3: Program correlation with full scale hardware tests

NASA Technical Reports Server (NTRS)

Kleckner, R. J.; Rosenlieb, J. W.; Dyba, G.

1980-01-01

The results of a series of full scale hardware tests comparing predictions of the SPHERBEAN computer program with measured data are presented. The SPHERBEAN program predicts the thermomechanical performance characteristics of high speed lubricated double row spherical roller bearings. The degree of correlation between performance predicted by SPHERBEAN and measured data is demonstrated. Experimental and calculated performance data is compared over a range in speed up to 19,400 rpm (0.8 MDN) under pure radial, pure axial, and combined loads.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
The Theory of Localist Representation and of a Purely Abstract Cognitive System: The Evidence from Cortical Columns, Category Cells, and Multisensory Neurons.

PubMed

Roy, Asim

2017-01-01

The debate about representation in the brain and the nature of the cognitive system has been going on for decades now. This paper examines the neurophysiological evidence, primarily from single cell recordings, to get a better perspective on both the issues. After an initial review of some basic concepts, the paper reviews the data from single cell recordings - in cortical columns and of category-selective and multisensory neurons. In neuroscience, columns in the neocortex (cortical columns) are understood to be a basic functional/computational unit. The paper reviews the fundamental discoveries about the columnar organization and finds that it reveals a massively parallel search mechanism. This columnar organization could be the most extensive neurophysiological evidence for the widespread use of localist representation in the brain. The paper also reviews studies of category-selective cells. The evidence for category-selective cells reveals that localist representation is also used to encode complex abstract concepts at the highest levels of processing in the brain. A third major issue is the nature of the cognitive system in the brain and whether there is a form that is purely abstract and encoded by single cells. To provide evidence for a single-cell based purely abstract cognitive system, the paper reviews some of the findings related to multisensory cells. It appears that there is widespread usage of multisensory cells in the brain in the same areas where sensory processing takes place. Plus there is evidence for abstract modality invariant cells at higher levels of cortical processing. Overall, that reveals the existence of a purely abstract cognitive system in the brain. The paper also argues that since there is no evidence for dense distributed representation and since sparse representation is actually used to encode memories, there is actually no evidence for distributed representation in the brain. Overall, it appears that, at an abstract level, the brain is a massively parallel, distributed computing system that is symbolic. The paper also explains how grounded cognition and other theories of the brain are fully compatible with localist representation and a purely abstract cognitive system.
The Theory of Localist Representation and of a Purely Abstract Cognitive System: The Evidence from Cortical Columns, Category Cells, and Multisensory Neurons

PubMed Central

Roy, Asim

2017-01-01

The debate about representation in the brain and the nature of the cognitive system has been going on for decades now. This paper examines the neurophysiological evidence, primarily from single cell recordings, to get a better perspective on both the issues. After an initial review of some basic concepts, the paper reviews the data from single cell recordings – in cortical columns and of category-selective and multisensory neurons. In neuroscience, columns in the neocortex (cortical columns) are understood to be a basic functional/computational unit. The paper reviews the fundamental discoveries about the columnar organization and finds that it reveals a massively parallel search mechanism. This columnar organization could be the most extensive neurophysiological evidence for the widespread use of localist representation in the brain. The paper also reviews studies of category-selective cells. The evidence for category-selective cells reveals that localist representation is also used to encode complex abstract concepts at the highest levels of processing in the brain. A third major issue is the nature of the cognitive system in the brain and whether there is a form that is purely abstract and encoded by single cells. To provide evidence for a single-cell based purely abstract cognitive system, the paper reviews some of the findings related to multisensory cells. It appears that there is widespread usage of multisensory cells in the brain in the same areas where sensory processing takes place. Plus there is evidence for abstract modality invariant cells at higher levels of cortical processing. Overall, that reveals the existence of a purely abstract cognitive system in the brain. The paper also argues that since there is no evidence for dense distributed representation and since sparse representation is actually used to encode memories, there is actually no evidence for distributed representation in the brain. Overall, it appears that, at an abstract level, the brain is a massively parallel, distributed computing system that is symbolic. The paper also explains how grounded cognition and other theories of the brain are fully compatible with localist representation and a purely abstract cognitive system. PMID:28261127
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
Computer program to simulate Raman scattering

NASA Technical Reports Server (NTRS)

Zilles, B.; Carter, R.

1977-01-01

A computer program is described for simulating the vibration-rotation and pure rotational spectrum of a combustion system consisting of various diatomic molecules and CO2 as a function of temperature and number density. Two kinds of spectra are generated: a pure rotational spectrum for any mixture of diatomic and linear triatomic molecules, and a vibrational spectrum for diatomic molecules. The program is designed to accept independent rotational and vibrational temperatures for each molecule, as well as number densities.
Dendritic Growth with Fluid Flow for Pure Materials

NASA Technical Reports Server (NTRS)

Jeong, Jun-Ho; Dantzig, Jonathan A.; Goldenfeld, Nigel

2003-01-01

We have developed a three-dimensional, adaptive, parallel finite element code to examine solidification of pure materials under conditions of forced flow. We have examined the effect of undercooling, surface tension anisotropy and imposed flow velocity on the growth. The flow significantly alters the growth process, producing dendrites that grow faster, and with greater tip curvature, into the flow. The selection constant decreases slightly with flow velocity in our calculations. The results of the calculations agree well with the transport solution of Saville and Beaghton at high undercooling and high anisotropy. At low undercooling, significant deviations are found. We attribute this difference to the influence of other parts of the dendrite, removed from the tip, on the flow field.
Edge profiles and limiter tests in Extrap T2

NASA Astrophysics Data System (ADS)

Bergsåker, H.; Hedin, G.; Ilyinsky, L.; Larsson, D.; Möller, A.

New edge profile measurements, including calorimetric measurements of the parallel heat flux, were made in Extrap T2. Test limiters of pure molybdenum and the TZM molybdenum alloy have been exposed in the edge plasma. The surface damage was studied, mainly by microscopy. Tungsten coated graphite probes were also exposed, and the surfaces were studied by microscopy, ion beam analysis and XPS. In this case cracking and mixing of carbon and tungsten at the interface was observed in the most heated areas, whereas carbide formation at the surface was seen in less heated areas. In these tests pure Mo generally fared better than TZM, and thin and cleaner coatings fared better than thicker and less clean.
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
A determination of the external forces required to move the benchmark active controls testing model in pure plunge and pure pitch

NASA Technical Reports Server (NTRS)

Dcruz, Jonathan

1993-01-01

In view of the strong need for a well-documented set of experimental data which is suitable for the validation and/or calibration of modern Computational Fluid Dynamics codes, the Benchmark Models Program was initiated by the Structural Dynamics Division of the NASA Langley Research Center. One of the models in the program, the Benchmark Active Controls Testing Model, consists of a rigid wing of rectangular planform with a NACA 0012 profile and three control surfaces (a trailing-edge control surface, a lower-surface spoiler, and an upper-surface spoiler). The model is affixed to a flexible mount system which allows only plunging and/or pitching motion. An approximate analytical determination of the forces required to move this model, with its control surfaces fixed, in pure plunge and pure pitch at a number of test conditions is included. This provides a good indication of the type of actuator system required to generate the aerodynamic data resulting from pure plunging and pure pitching motion, in which much interest was expressed. The analysis makes use of previously obtained numerical results.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
WFIRST: Science from the Guest Investigator and Parallel Observation Programs

NASA Astrophysics Data System (ADS)

Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

2018-01-01

The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
Programming Probabilistic Structural Analysis for Parallel Processing Computer

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

1991-01-01

The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Performance Implications of Synchronization Support for Parallel FORTRAN Programs

DTIC Science & Technology

1991-06-17

applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3

Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Selective, Embedded, Just-In-Time Specialization (SEJITS): Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages

DTIC Science & Technology

2012-12-01

identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

PubMed

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
C to VHDL compiler

NASA Astrophysics Data System (ADS)

Berdychowski, Piotr P.; Zabolotny, Wojciech M.

2010-09-01

The main goal of C to VHDL compiler project is to make FPGA platform more accessible for scientists and software developers. FPGA platform offers unique ability to configure the hardware to implement virtually any dedicated architecture, and modern devices provide sufficient number of hardware resources to implement parallel execution platforms with complex processing units. All this makes the FPGA platform very attractive for those looking for efficient heterogeneous, computing environment. Current industry standard in development of digital systems on FPGA platform is based on HDLs. Although very effective and expressive in hands of hardware development specialists, these languages require specific knowledge and experience, unreachable for most scientists and software programmers. C to VHDL compiler project attempts to remedy that by creating an application, that derives initial VHDL description of a digital system (for further compilation and synthesis), from purely algorithmic description in C programming language. This idea itself is not new, and the C to VHDL compiler combines the best approaches from existing solutions developed over many previous years, with the introduction of some new unique improvements.
Automatic mesh refinement and parallel load balancing for Fokker-Planck-DSMC algorithm

NASA Astrophysics Data System (ADS)

Küchlin, Stephan; Jenny, Patrick

2018-06-01

Recently, a parallel Fokker-Planck-DSMC algorithm for rarefied gas flow simulation in complex domains at all Knudsen numbers was developed by the authors. Fokker-Planck-DSMC (FP-DSMC) is an augmentation of the classical DSMC algorithm, which mitigates the near-continuum deficiencies in terms of computational cost of pure DSMC. At each time step, based on a local Knudsen number criterion, the discrete DSMC collision operator is dynamically switched to the Fokker-Planck operator, which is based on the integration of continuous stochastic processes in time, and has fixed computational cost per particle, rather than per collision. In this contribution, we present an extension of the previous implementation with automatic local mesh refinement and parallel load-balancing. In particular, we show how the properties of discrete approximations to space-filling curves enable an efficient implementation. Exemplary numerical studies highlight the capabilities of the new code.
Concepts of Concurrent Programming

DTIC Science & Technology

1990-04-01

to the material presented. Carriero89 Carriero, N., and Gelernter, D. " How to Write Parallel Programs : A Guide to the Perplexed." ACM...between the architectures on which programs can be executed and the application domains from which problems are drawn. Our goal is to show how programs ...Sept. 1989), 251-510. Abstract: There are four papers: 1. Programming Languages for Distributed Computing Systems (52); 2. How to Write Parallel
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei; Xu, Jingling

2006-01-01

This slide presentation reviews some of the issues around distributed parallel programming. It compares and contrast two methods of programming: Single Program Multiple Data (SPMD) with the Navigational Programming (NAVP). It then reviews the distributed sequential computing (DSC) method and the methodology of NavP. Case studies are presented. It also reviews the work that is being done to enable the NavP system.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
On program restructuring, scheduling, and communication for parallel processor systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polychronopoulos, Constantine D.

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less
Linear polarimetry of AP stars. IV. The influence of deviations from a pure dipolar model.

NASA Astrophysics Data System (ADS)

Leroy, J. L.; Landolfi, M.; Landi Degl'Innocenti, M.; Landi Degl'Innocenti, E.; Bagnulo, S.; Laporte, P.

1995-09-01

In the previous papers of this series we have described a new observational program of broadband linear polarimetry aimed at Ap stars. At the same time, we have established a canonical model, based on the oblique rotator geometry, which describes successfully the main features of the observed polarization: in some cases the linear polarization data, combined with the classical circular polarization measurements, allow one to determine the characteristic parameters which define the oblique dipolar rotator. However, we have also observed polarization diagrams that depart clearly from those predicted by the canonical model, which means that it is not always possible to rely on a pure dipolar model (nor on a combination of a dipole plus a linear quadrupole parallel to the dipole). Although an interpretation of the polarization peculiarities in terms of magnetic `anomalies' (i.e. deviations from the dipolar configuration) is quite natural, one must also take into account the possible influence of local abundance inhomogeneities. Therefore, we have first studied the sensitivity of the polarized signal (which is known to be due to the differential saturation of Zeeman components in spectral lines) to a variation of the metallic absorption spectrum. Then we have examined how a local enhancement (or reduction) of the polarization produced by a dipolar magnetic field affects the Fourier spectrum of the observed polarization signal. Finally, we have designed an inversion program making possible the recovery - under certain restrictions - of the spatial modulations of the polarization generated by a dipole, which are necessary to explain `odd' polarimetric data. This program has been applied to the data gathered from three stars (49 Cam, β CrB, HD 71866). As far as the last star is concerned, none of the spatial modulations considered was able to reproduce the observations. On the contrary, good solutions are found for the other two. However, if one interprets the variations of the polarization as the result of abundance variations, which must correspond to a modulation of the absorption spectrum, a contradiction arises, especially for β CrB, because the observed spectral variability of these stars is too small to account for our computed maps. Therefore, non-canonical polarization diagrams must essentially be interpreted in terms of magnetic anomalies, not of abundance anomalies: in other words, the peculiarities of the polarization diagrams are likely to result mainly from departures of the magnetic configuration from the pure dipolar configuration.
Modelling parallel programs and multiprocessor architectures with AXE

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.

1991-01-01

AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Web Based Parallel Programming Workshop for Undergraduate Education.

ERIC Educational Resources Information Center

Marcus, Robert L.; Robertson, Douglass

Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Electron Cooling and Isotropization during Magnetotail Current Sheet Thinning: Implications for Parallel Electric Fields

NASA Astrophysics Data System (ADS)

Lu, San; Artemyev, A. V.; Angelopoulos, V.

2017-11-01

Magnetotail current sheet thinning is a distinctive feature of substorm growth phase, during which magnetic energy is stored in the magnetospheric lobes. Investigation of charged particle dynamics in such thinning current sheets is believed to be important for understanding the substorm energy storage and the current sheet destabilization responsible for substorm expansion phase onset. We use Time History of Events and Macroscale Interactions during Substorms (THEMIS) B and C observations in 2008 and 2009 at 18 - 25 RE to show that during magnetotail current sheet thinning, the electron temperature decreases (cooling), and the parallel temperature decreases faster than the perpendicular temperature, leading to a decrease of the initially strong electron temperature anisotropy (isotropization). This isotropization cannot be explained by pure adiabatic cooling or by pitch angle scattering. We use test particle simulations to explore the mechanism responsible for the cooling and isotropization. We find that during the thinning, a fast decrease of a parallel electric field (directed toward the Earth) can speed up the electron parallel cooling, causing it to exceed the rate of perpendicular cooling, and thus lead to isotropization, consistent with observation. If the parallel electric field is too small or does not change fast enough, the electron parallel cooling is slower than the perpendicular cooling, so the parallel electron anisotropy grows, contrary to observation. The same isotropization can also be accomplished by an increasing parallel electric field directed toward the equatorial plane. Our study reveals the existence of a large-scale parallel electric field, which plays an important role in magnetotail particle dynamics during the current sheet thinning process.
Instrumentation, performance visualization, and debugging tools for multiprocessors

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

1991-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Subcritical crack growth in soda-lime glass in combined mode I and mode II loading

NASA Technical Reports Server (NTRS)

Singh, Dileep; Shetty, Dinesh K.

1990-01-01

Subcritical crack growth under mixed-mode loading was studied in soda-lime glass. Pure mode I, combined mode I and mode II, and pure mode II loadings were achieved in precracked disk specimens by loading in diametral compression at selected angles with respect to the symmetric radial crack. Crack growth was monitored by measuring the resistance changes in a microcircuit grid consisting of parallel, electrically conducting grid lines deposited on the surface of the disk specimens by photolithography. Subcritical crack growth rates in pure mode I, pure mode II, and combined mode I and mode II loading could be described by an exponential relationship between crack growth rate and an effective crack driving force derived from a mode I-mode II fracture toughness envelope. The effective crack driving force was based on an empirical representation of the noncoplanar strain energy release rate. Stress intensities for kinked cracks were assessed using the method of caustics and an initial decrease and a subsequent increase in the subcritical crack growth rates of kinked cracks were shown to correlate with the variations of the mode I and the mode II stress intensities.
Parallel computation with the force

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1985-01-01

A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

2003-01-01

Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Pure F-actin networks are distorted and branched by steps in the critical-point drying method.

PubMed

Resch, Guenter P; Goldie, Kenneth N; Hoenger, Andreas; Small, J Victor

2002-03-01

Elucidation of the ultrastructural organization of actin networks is crucial for understanding the molecular mechanisms underlying actin-based motility. Results obtained from cytoskeletons and actin comets prepared by the critical-point procedure, followed by rotary shadowing, support recent models incorporating actin filament branching as a main feature of lamellipodia and pathogen propulsion. Since actin branches were not evident in earlier images obtained by negative staining, we explored how these differences arise. Accordingly, we have followed the structural fate of dense networks of pure actin filaments subjected to steps of the critical-point drying protocol. The filament networks have been visualized in parallel by both cryo-electron microscopy and negative staining. Our results demonstrate the selective creation of branches and other artificial structures in pure F-actin networks by the critical-point procedure and challenge the reliability of this method for preserving the detailed organization of actin assemblies that drive motility. (c) 2002 Elsevier Science (USA).

76 FR 62808 - Pilot Program for Parallel Review of Medical Products

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-11

... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...
Algorithms and programming tools for image processing on the MPP

NASA Technical Reports Server (NTRS)

Reeves, A. P.

1985-01-01

Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Program Correctness, Verification and Testing for Exascale (Corvette)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sen, Koushik; Iancu, Costin; Demmel, James W

The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Crystal Orientation Controlled Photovoltaic Properties of Multilayer GaAs Nanowire Arrays.

PubMed

Han, Ning; Yang, Zai-Xing; Wang, Fengyun; Yip, SenPo; Li, Dapan; Hung, Tak Fu; Chen, Yunfa; Ho, Johnny C

2016-06-28

In recent years, despite significant progress in the synthesis, characterization, and integration of various nanowire (NW) material systems, crystal orientation controlled NW growth as well as real-time assessment of their growth-structure-property relationships still presents one of the major challenges in deploying NWs for practical large-scale applications. In this study, we propose, design, and develop a multilayer NW printing scheme for the determination of crystal orientation controlled photovoltaic properties of parallel GaAs NW arrays. By tuning the catalyst thickness and nucleation and growth temperatures in the two-step chemical vapor deposition, crystalline GaAs NWs with uniform, pure ⟨110⟩ and ⟨111⟩ orientations and other mixture ratios can be successfully prepared. Employing lift-off resists, three-layer NW parallel arrays can be easily attained for X-ray diffraction in order to evaluate their growth orientation along with the fabrication of NW parallel array based Schottky photovoltaic devices for the subsequent performance assessment. Notably, the open-circuit voltage of purely ⟨111⟩-oriented NW arrayed cells is far higher than that of ⟨110⟩-oriented NW arrayed counterparts, which can be interpreted by the different surface Fermi level pinning that exists on various NW crystal surface planes due to the different As dangling bond densities. All this indicates the profound effect of NW crystal orientation on physical and chemical properties of GaAs NWs, suggesting the careful NW design considerations for achieving optimal photovoltaic performances. The approach presented here could also serve as a versatile and powerful platform for in situ characterization of other NW materials.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

NASA Astrophysics Data System (ADS)

Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.

1995-03-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simunovic, S.; Zacharia, T.; Baltas, N.

1995-04-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Separation of photoactive conformers based on hindered diarylethenes: efficient modulation in photocyclization quantum yields.

PubMed

Li, Wenlong; Jiao, Changhong; Li, Xin; Xie, Yongshu; Nakatani, Keitaro; Tian, He; Zhu, Weihong

2014-04-25

Endowing both solvent independency and excellent thermal bistability, the benzobis(thiadiazole)-bridged diarylethene system provides an efficient approach to realize extremely high photocyclization quantum yields (Φo-c , up to 90.6 %) by both separating completely pure anti-parallel conformer and suppressing intramolecular charge transfer (ICT). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Risk-Sensitive Control of Pure Jump Process on Countable Space with Near Monotone Cost

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suresh Kumar, K., E-mail: suresh@math.iitb.ac.in; Pal, Chandan, E-mail: cpal@math.iitb.ac.in

2013-12-15

In this article, we study risk-sensitive control problem with controlled continuous time pure jump process on a countable space as state dynamics. We prove multiplicative dynamic programming principle, elliptic and parabolic Harnack’s inequalities. Using the multiplicative dynamic programing principle and the Harnack’s inequalities, we prove the existence and a characterization of optimal risk-sensitive control under the near monotone condition.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
The parallel programming of voluntary and reflexive saccades.

PubMed

Walker, Robin; McSorley, Eugene

2006-06-01

A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
78 FR 76628 - Pilot Program for Parallel Review of Medical Products; Extension of the Duration of the Program

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-18

...The Food and Drug Administration (FDA) and the Centers for Medicare and Medicaid Services (CMS) (the Agencies) are announcing the extension of the ``Pilot Program for Parallel Review of Medical Products.'' The Agencies have decided to continue the program as currently designed for an additional period of 2 years from the date of publication of this notice.
Towards Exascale Seismic Imaging and Inversion

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.

2015-12-01

Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.

Integrated Task and Data Parallel Programming

NASA Technical Reports Server (NTRS)

Grimshaw, A. S.

1998-01-01

This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Integrated Task And Data Parallel Programming: Language Design

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Describing, using 'recognition cones'. [parallel-series model with English-like computer program

NASA Technical Reports Server (NTRS)

Uhr, L.

1973-01-01

A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Parallelization of elliptic solver for solving 1D Boussinesq model

NASA Astrophysics Data System (ADS)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Relative Debugging of Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2002-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Paralex: An Environment for Parallel Programming in Distributed Systems

DTIC Science & Technology

1991-12-07

distributed systems is coni- parable to assembly language programming for traditional sequential systems - the user must resort to low-level primitives ...to accomplish data encoding/decoding, communication, remote exe- cution, synchronization , failure detection and recovery. It is our belief that... synchronization . Finally, composing parallel programs by interconnecting se- quential computations allows automatic support for heterogeneity and fault tolerance
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Improving operating room productivity via parallel anesthesia processing.

PubMed

Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R

2014-01-01

Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.
Role of a Modulator in the Synthesis of Phase-Pure NU-1000.

PubMed

Webber, Thomas E; Liu, Wei-Guang; Desai, Sai Puneet; Lu, Connie C; Truhlar, Donald G; Penn, R Lee

2017-11-15

NU-1000 is a robust, mesoporous metal-organic framework (MOF) with hexazirconium nodes ([Zr 6 O 16 H 16 ] 8+ , referred to as oxo-Zr 6 nodes) that can be synthesized by combining a solution of ZrOCl 2 ·8H 2 O and a benzoic acid modulator in N,N-dimethylformamide with a solution of linker (1,3,6,8-tetrakis(p-benzoic acid)pyrene, referred to as H 4 TBAPy) and by aging at an elevated temperature. Typically, the resulting crystals are primarily composed of NU-1000 domains that crystallize with a more dense phase that shares structural similarity with NU-901, which is an MOF composed of the same linker molecules and nodes. Density differences between the two polymorphs arise from the differences in the node orientation: in NU-1000, the oxo-Zr 6 nodes rotate 120° from node to node, whereas in NU-901, all nodes are aligned in parallel. Considering this structural difference leads to the hypothesis that changing the modulator from benzoic acid to a larger and more rigid biphenyl-4-carboxylic acid might lead to a stronger steric interaction between the modulator coordinating on the oxo-Zr 6 node and misaligned nodes or linkers in the large pore and inhibit the growth of the more dense NU-901-like material, resulting in phase-pure NU-1000. Side-by-side reactions comparing the products of synthesis using benzoic acid or biphenyl-4-carboxylic acid as a modulator produce structurally heterogeneous crystals and phase-pure NU-1000 crystals. It can be concluded that the larger and more rigid biphenyl-4-carboxylate inhibits the incorporation of nodes with an alignment parallel to the neighboring nodes already residing in the crystal.
Observations of Large-Amplitude, Parallel, Electrostatic Waves Associated with the Kelvin-Helmholtz Instability by the Magnetospheric Multiscale Mission

NASA Technical Reports Server (NTRS)

Wilder, F. D.; Ergun, R. E.; Schwartz, S. J.; Newman, D. L.; Eriksson, S.; Stawarz, J. E.; Goldman, M. V.; Goodrich, K. A.; Gershman, D. J.; Malaspina, D.;

2016-01-01

On 8 September 2015, the four Magnetospheric Multiscale spacecraft encountered a Kelvin-Helmholtz unstable magnetopause near the dusk flank. The spacecraft observed periodic compressed current sheets, between which the plasma was turbulent. We present observations of large-amplitude (up to 100 mVm) oscillations in the electric field. Because these oscillations are purely parallel to the background magnetic field, electrostatic, and below the ion plasma frequency, they are likely to be ion acoustic-like waves. These waves are observed in a turbulent plasma where multiple particle populations are intermittently mixed, including cold electrons with energies less than 10 eV. Stability analysis suggests a cold electron component is necessary for wave growth.

Complementary spin transistor using a quantum well channel.

PubMed

Park, Youn Ho; Choi, Jun Woo; Kim, Hyung-Jun; Chang, Joonyeon; Han, Suk Hee; Choi, Heon-Jin; Koo, Hyun Cheol

2017-04-20

In order to utilize the spin field effect transistor in logic applications, the development of two types of complementary transistors, which play roles of the n- and p-type conventional charge transistors, is an essential prerequisite. In this research, we demonstrate complementary spin transistors consisting of two types of devices, namely parallel and antiparallel spin transistors using InAs based quantum well channels and exchange-biased ferromagnetic electrodes. In these spin transistors, the magnetization directions of the source and drain electrodes are parallel or antiparallel, respectively, depending on the exchange bias field direction. Using this scheme, we also realize a complementary logic operation purely with spin transistors controlled by the gate voltage, without any additional n- or p-channel transistor.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Evaluation, development, and characterization of superconducting materials for space applications

NASA Technical Reports Server (NTRS)

Thorpe, Arthur N.

1990-01-01

The anisotropic electromagnetic features of a grain-aligned YBa2Cu3O(x) bulk sample derived from a process of long-time partial melt growth were investigated by the measurements of direct current magnetization (at 77 K) and alternating current susceptibility as a function of temperature, with the fields applied parallel and perpendicular to the c axis, respectively. The extended Bean model was further studied and applied to explain the experimental results. Upon comparison of the grain-aligned sample with pure single crystal materials, it is concluded that because of the existence of more effective pinning sites in the grain-aligned sample, not only its critical current density perpendicular to the c axis is improved, but the one parallel to the c axis is improved even more significantly. The anisotropy in the critical current densities in the grain-aligned sample at 77 K is at least one to two orders of magnitude smaller than in the pure single crystal. The measurement of anisotropy of alternating current susceptibility as a function of temperature, especially its imaginary part, shows that there are still some residues of interlayer weak links in the grain-aligned samples, but they are quite different from and far less serious than the weak links in the sintered sample.
Time-Domain Pure-state Polarization Analysis of Surface Waves Traversing California

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, J; Walter, W R; Lay, T

A time-domain pure-state polarization analysis method is used to characterize surface waves traversing California parallel to the plate boundary. The method is applied to data recorded at four broadband stations in California from twenty-six large, shallow earthquakes which occurred since 1988, yielding polarization parameters such as the ellipticity, Euler angles, instantaneous periods, and wave incident azimuths. The earthquakes are located along the circum-Pacific margin and the ray paths cluster into two groups, with great-circle paths connecting stations MHC and PAS or CMB and GSC. The first path (MHC-PAS) is in the vicinity of the San Andreas Fault System (SAFS), andmore » the second (CMB-GSC) traverses the Sierra Nevada Batholith parallel to and east of the SAFS. Both Rayleigh and Love wave data show refractions due to lateral velocity heterogeneities under the path, indicating that accurate phase velocity and attenuation analysis requires array measurements. The Rayleigh waves are strongly affected by low velocity anomalies beneath Central California, with ray paths bending eastward as waves travel toward the south, while Love waves are less affected, providing observables to constrain the depth extent of the anomalies. Strong lateral gradients in the lithospheric structure between the continent and the ocean are the likely cause of the path deflections.« less

Modeling and experimental study of oil/water contact angle on biomimetic micro-parallel-patterned self-cleaning surfaces of selected alloys used in water industry

NASA Astrophysics Data System (ADS)

Nickelsen, Simin; Moghadam, Afsaneh Dorri; Ferguson, J. B.; Rohatgi, Pradeep

2015-10-01

In the present study, the wetting behavior of surfaces of various common metallic materials used in the water industry including C84400 brass, commercially pure aluminum (99.0% pure), Nickle-Molybdenum alloy (Hastelloy C22), and 316 Stainless Steel prepared by mechanical abrasion and contact angles of several materials after mechanical abrasion were measured. A model to estimate roughness factor, Rf, and fraction of solid/oil interface, ƒso, for surfaces prepared by mechanical abrasion is proposed based on the assumption that abrasive particles acting on a metallic surface would result in scratches parallel to each other and each scratch would have a semi-round cross-section. The model geometrically describes the relation between sandpaper particle size and water/oil contact angle predicted by both the Wenzel and Cassie-Baxter contact type, which can then be used for comparison with experimental data to find which regime is active. Results show that brass and Hastelloy followed Cassie-Baxter behavior, aluminum followed Wenzel behavior and stainless steel exhibited a transition from Wenzel to Cassie-Baxter. Microstructural studies have also been done to rule out effects beyond the Wenzel and Cassie-Baxter theories such as size of structural details.
Dual and parallel postdoctoral training programs: implications for the osteopathic medical profession.

PubMed

Burkhart, Diane N; Lischka, Terri A

2011-04-01

Students in colleges of osteopathic medicine have several options when considering postdoctoral training programs. In addition to training programs approved solely by the American Osteopathic Association or accredited solely by the Accreditation Council for Graduate Medical Education (ACGME), students can pursue programs accredited by both organizations (ie, dually accredited programs) or osteopathic programs that occur side-by-side with ACGME programs (ie, parallel programs). In the present article, we report on the availability and growth of these 2 training options and describe their benefits and drawbacks for trainees and the osteopathic medical profession as a whole.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
Multiple-stage pure phase encoding with biometric information

NASA Astrophysics Data System (ADS)

Chen, Wen

2018-01-01

In recent years, many optical systems have been developed for securing information, and optical encryption/encoding has attracted more and more attention due to the marked advantages, such as parallel processing and multiple-dimensional characteristics. In this paper, an optical security method is presented based on pure phase encoding with biometric information. Biometric information (such as fingerprint) is employed as security keys rather than plaintext used in conventional optical security systems, and multiple-stage phase-encoding-based optical systems are designed for generating several phase-only masks with biometric information. Subsequently, the extracted phase-only masks are further used in an optical setup for encoding an input image (i.e., plaintext). Numerical simulations are conducted to illustrate the validity, and the results demonstrate that high flexibility and high security can be achieved.
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Exploiting loop level parallelism in nonprocedural dataflow programs

NASA Technical Reports Server (NTRS)

Gokhale, Maya B.

1987-01-01

Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Tolerant (parallel) Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Bailey, David H. (Technical Monitor)

1997-01-01

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

USDA-ARS?s Scientific Manuscript database

This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Contact allergy to air-exposed geraniol: clinical observations and report of 14 cases.

PubMed

Hagvall, Lina; Karlberg, Ann-Therese; Christensson, Johanna Bråred

2012-07-01

The fragrance terpene geraniol forms sensitizing compounds via autoxidation and skin metabolism. Geranial and neral, the two isomers of citral, are the major haptens formed in both of these activation pathways. To investigate whether testing with oxidized geraniol detects more cases of contact allergy than testing with pure geraniol. The pattern of reactions to pure and oxidized geraniol, and metabolites/autoxidation products, was studied to investigate the importance of autoxidation or cutaneous metabolism in contact allergy to geraniol. Pure and oxidized geraniol were tested at 2.0% petrolatum in 2227 and 2179 consecutive patients, respectively. In parallel, geranial, neral and citral were tested in 2152, 1626 and 1055 consecutive patients, respectively. Pure and oxidized geraniol gave positive patch test reactions in 0.13% and 0.55% of the patients, respectively. Eight of 11 patients with positive patch test reactions to oxidized geraniol also reacted to citral or its components. Relevance for the positive patch test reactions in relation to the patients' dermatitis was found in 11 of 14 cases. Testing with oxidized geraniol could detect more cases of contact allergy to geraniol. The reaction pattern of the 14 cases presented indicates that both autoxidation and metabolism could be important in sensitization to geraniol. © 2012 John Wiley & Sons A/S.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
In vitro apatite formation on nano-crystalline titania layer aligned parallel to Ti6Al4V alloy substrates with sub-millimeter gap.

PubMed

Hayakawa, Satoshi; Matsumoto, Yuko; Uetsuki, Keita; Shirosaki, Yuki; Osaka, Akiyoshi

2015-06-01

Pure titanium substrates were chemically oxidized with H2O2 and subsequent thermally oxidized at 400 °C in air to form anatase-type titania layer on their surface. The chemically and thermally oxidized titanium substrate (CHT) was aligned parallel to the counter specimen such as commercially pure titanium (cpTi), titanium alloy (Ti6Al4V) popularly used as implant materials or Al substrate with 0.3-mm gap. Then, they were soaked in Kokubo's simulated body fluid (SBF, pH 7.4, 36.5 °C) for 7 days. XRD and SEM analysis showed that the in vitro apatite-forming ability of the contact surface of the CHT specimen decreased in the order: cpTi > Ti6Al4V > Al. EDX and XPS surface analysis showed that aluminum species were present on the contact surface of the CHT specimen aligned parallel to the counter specimen such as Ti6Al4V and Al. This result indicated that Ti6Al4V or Al specimens released the aluminum species into the SBF under the spatial gap. The released aluminum species might be positively or negatively charged in the SBF and thus can interact with calcium or phosphate species as well as titania layer, causing the suppression of the primary heterogeneous nucleation and growth of apatite on the contact surface of the CHT specimen under the spatial gap. The diffusion and adsorption of aluminum species derived from the half-sized counter specimen under the spatial gap resulted in two dimensionally area-selective deposition of apatite particles on the contact surfaces of the CHT specimen.
Effects of Hall current and electron temperature anisotropy on proton fire-hose instabilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hau, L.-N.; Department of Physics, National Central University, Jhongli, Taiwan; Wang, B.-J.

The standard magnetohydrodynamic (MHD) theory predicts that the Alfvén wave may become fire-hose unstable for β{sub ∥}−β{sub ⊥}>2. In this study, we examine the proton fire-hose instability (FHI) based on the gyrotropic two-fluid model, which incorporates the ion inertial effects arising from the Hall current and electron temperature anisotropy but neglects the electron inertia in the generalized Ohm's law. The linear dispersion relation is derived and analyzed which in the long wavelength approximation, λ{sub i}k→0 or α{sub e}=μ{sub 0}(p{sub ∥,e}−p{sub ⊥,e})/B{sup 2}=1, recovers the ideal MHD model with separate temperature for ions and electrons. Here, λ{sub i} is the ionmore » inertial length and k is the wave number. For parallel propagation, both ion cyclotron and whistler waves become propagating and growing for β{sub ∥}−β{sub ⊥}>2+λ{sub i}{sup 2}k{sup 2}(α{sub e}−1){sup 2}/2. For oblique propagation, the necessary condition for FHI remains to be β{sub ∥}−β{sub ⊥}>2 and there exist one or two unstable fire-hose modes, which can be propagating and growing or purely growing. For large λ{sub i}k values, there exists no nearly parallel FHI leaving only oblique FHI and the effect of α{sub e}>1 may greatly enhance the growth rate of parallel and oblique FHI. The magnetic field polarization of FHI may be reversed due to the sign change associated with (α{sub e}−1) and the purely growing FHI may possess linear polarization while the propagating and growing FHI may possess right-handed or left-handed polarization.« less
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.

A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

2016-08-01

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
Concurrency-based approaches to parallel programming

NASA Technical Reports Server (NTRS)

Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.

1995-01-01

The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Reliability models for dataflow computer systems

NASA Technical Reports Server (NTRS)

Kavi, K. M.; Buckles, B. P.

1985-01-01

The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Visualization Software for VisIT Java Client

DOE Office of Scientific and Technical Information (OSTI.GOV)

Billings, Jay Jay; Smith, Robert W

The VisIT Java Client (JVC) library is a lightweight thin client that is designed and written purely in the native language of Java (the Python & JavaScript versions of the library use the same concept) and communicates with any new unmodified standalone version of VisIT, a high performance computing parallel visualization toolkit, over traditional or web sockets and dynamically determines capabilities of the running VisIT instance whether local or remote.
Hydrogen Assisted Cracking of High Strength Alloys

DTIC Science & Technology

2003-08-01

maraging steels (Dautovich and Floreen, 1973, 1977; Gerberich et al., 1988; Yamaguchi, et al., 1997). This behavior is typically described by a...transgranular. A similar maximum in IG cracking susceptibility near the free corrosion potential was reported for 18Ni Maraging steel in neutral NaCl... steel in 133 kPa pure H2 parallels the behavior of AISI 4340 and the 18 Ni maraging steels , particularly in terms of a low temperature activation
Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
The Goddard Space Flight Center Program to develop parallel image processing systems

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1972-01-01

Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Parallel Volunteer Learning during Youth Programs

ERIC Educational Resources Information Center

Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

2012-01-01

Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
Mechanism to support generic collective communication across a variety of programming models

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Dozsa, Gabor [Ardsley, NY; Kumar, Sameer [White Plains, NY

2011-07-19

A system and method for supporting collective communications on a plurality of processors that use different parallel programming paradigms, in one aspect, may comprise a schedule defining one or more tasks in a collective operation, an executor that executes the task, a multisend module to perform one or more data transfer functions associated with the tasks, and a connection manager that controls one or more connections and identifies an available connection. The multisend module uses the available connection in performing the one or more data transfer functions. A plurality of processors that use different parallel programming paradigms can use a common implementation of the schedule module, the executor module, the connection manager and the multisend module via a language adaptor specific to a parallel programming paradigm implemented on a processor.
An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.

2009-08-01

Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisionsmore » are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.« less
Programming with Intervals

NASA Astrophysics Data System (ADS)

Matsakis, Nicholas D.; Gross, Thomas R.

Intervals are a new, higher-level primitive for parallel programming with which programmers directly construct the program schedule. Programs using intervals can be statically analyzed to ensure that they do not deadlock or contain data races. In this paper, we demonstrate the flexibility of intervals by showing how to use them to emulate common parallel control-flow constructs like barriers and signals, as well as higher-level patterns such as bounded-buffer producer-consumer. We have implemented intervals as a publicly available library for Java and Scala.
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments

DTIC Science & Technology

2015-11-20

1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming
Motion control of planar parallel robot using the fuzzy descriptor system approach.

PubMed

Vermeiren, Laurent; Dequidt, Antoine; Afroun, Mohamed; Guerra, Thierry-Marie

2012-09-01

This work presents the control of a two-degree of freedom parallel robot manipulator. A quasi-LPV approach, through the so-called TS fuzzy model and LMI constraints problems is used. Moreover, in this context a way to derive interesting control laws is to keep the descriptor form of the mechanical system. Therefore, new LMI problems have to be defined that helps to reduce the conservatism of the usual results. Some relaxations are also proposed to leave the pure quadratic stability/stabilization framework. A comparison study between the classical control strategies from robotics and the control design using TS fuzzy descriptor models is carried out to show the interest of the proposed approach. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

CFD Research, Parallel Computation and Aerodynamic Optimization

NASA Technical Reports Server (NTRS)

Ryan, James S.

1995-01-01

During the last five years, CFD has matured substantially. Pure CFD research remains to be done, but much of the focus has shifted to integration of CFD into the design process. The work under these cooperative agreements reflects this trend. The recent work, and work which is planned, is designed to enhance the competitiveness of the US aerospace industry. CFD and optimization approaches are being developed and tested, so that the industry can better choose which methods to adopt in their design processes. The range of computer architectures has been dramatically broadened, as the assumption that only huge vector supercomputers could be useful has faded. Today, researchers and industry can trade off time, cost, and availability, choosing vector supercomputers, scalable parallel architectures, networked workstations, or heterogenous combinations of these to complete required computations efficiently.
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Debugging Fortran on a shared memory machine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Allen, T.R.; Padua, D.A.

1987-01-01

Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
A portable MPI-based parallel vector template library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
Parallel computation and the basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1993-05-01

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
The Automated Instrumentation and Monitoring System (AIMS) reference manual

NASA Technical Reports Server (NTRS)

Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

1993-01-01

Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE PAGES

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...

2016-06-01

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Parent-Child Parallel-Group Intervention for Childhood Aggression in Hong Kong

ERIC Educational Resources Information Center

Fung, Annis L. C.; Tsang, Sandra H. K. M.

2006-01-01

This article reports the original evidence-based outcome study on parent-child parallel group-designed Anger Coping Training (ACT) program for children aged 8-10 with reactive aggression and their parents in Hong Kong. This research program involved experimental and control groups with pre- and post-comparison. Quantitative data collection…
Parallel computer vision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Uhr, L.

1987-01-01

This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
Parallel Performance of a Combustion Chemistry Simulation

DOE PAGES

Skinner, Gregg; Eigenmann, Rudolf

1995-01-01

We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Algorithms and programming tools for image processing on the MPP, part 2

NASA Technical Reports Server (NTRS)

Reeves, Anthony P.

1986-01-01

A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
Scalable Unix commands for parallel processors : a high-performance implementation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ong, E.; Lusk, E.; Gropp, W.

2001-06-22

We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1989-01-01

A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
A CS1 pedagogical approach to parallel thinking

NASA Astrophysics Data System (ADS)

Rague, Brian William

Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of any initial three-week CS1 level module when compared with student comprehension levels just prior to starting the course. Survey results measured during the ninth week of the course reveal that performance levels remained high compared to pre-course performance scores. A second result produced by this study reveals no statistically significant interaction effect between the intervention method and student performance as measured by the evaluation instrument over three separate testing periods. However, visual inspection of survey score trends and the low p-value generated by the interaction analysis (0.062) indicate that further studies may verify improved concept retention levels for the lecture w/PAT group.
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less
PISCES 2 users manual

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.

1987-01-01

PISCES 2 is a programming environment and set of extensions to Fortran 77 for parallel programming. It is intended to provide a basis for writing programs for scientific and engineering applications on parallel computers in a way that is relatively independent of the particular details of the underlying computer architecture. This user's manual provides a complete description of the PISCES 2 system as it is currently implemented on the 20 processor Flexible FLEX/32 at NASA Langley Research Center.

A language comparison for scientific computing on MIMD architectures

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.

1989-01-01

Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
Code Parallelization with CAPO: A User Manual

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2001-01-01

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Thread concept for automatic task parallelization in image analysis

NASA Astrophysics Data System (ADS)

Lueckenhaus, Maximilian; Eckstein, Wolfgang

1998-09-01

Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Multiple pure tone noise generated by fans at supersonic tip speeds

NASA Technical Reports Server (NTRS)

Sofrin, T. G.; Pickett, G. F.

1974-01-01

The existence of clusters of pure tones at integral multiples of shaft speed has been noted for supersonic-tip-speed operation of fans and compressors. A continuing program to explore this phenomenon, often called combination-tone noise, has been in effect for several years. This paper reviews the research program, which involves a wide range of engines, compressor rigs, and special apparatus. Elements of the aerodynamics of the blade-associated shock waves are outlined and causes of blade-to-blade shock inequalities, responsible for the multiple tones, are described.
PIPS-SBB: A Parallel Distributed-Memory Branch-and-Bound Algorithm for Stochastic Mixed-Integer Programs

DOE PAGES

Munguia, Lluis-Miquel; Oxberry, Geoffrey; Rajan, Deepak

2016-05-01

Stochastic mixed-integer programs (SMIPs) deal with optimization under uncertainty at many levels of the decision-making process. When solved as extensive formulation mixed- integer programs, problem instances can exceed available memory on a single workstation. In order to overcome this limitation, we present PIPS-SBB: a distributed-memory parallel stochastic MIP solver that takes advantage of parallelism at multiple levels of the optimization process. We also show promising results on the SIPLIB benchmark by combining methods known for accelerating Branch and Bound (B&B) methods with new ideas that leverage the structure of SMIPs. Finally, we expect the performance of PIPS-SBB to improve furthermore » as more functionality is added in the future.« less
On the utility of threads for data parallel programming

NASA Technical Reports Server (NTRS)

Fahringer, Thomas; Haines, Matthew; Mehrotra, Piyush

1995-01-01

Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.
Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

NASA Technical Reports Server (NTRS)

Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

2005-01-01

The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.
A design methodology for portable software on parallel computers

NASA Technical Reports Server (NTRS)

Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.

1993-01-01

This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven.
Eco-Cities: Possible or Purely Utopian?

DTIC Science & Technology

2009-12-01

00-2009 to 00-00-2009 4. TITLE AND SUBTITLE Eco-Cities: Possible or Purely Utopian? 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT...2005, Disney modified their building plans for Hong Kong Disneyland by shifting the angle of the front gate by twelve degrees in order to abide by
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

PubMed

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
GRADSPMHD: A parallel MHD code based on the SPH formalism

NASA Astrophysics Data System (ADS)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

PubMed

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

PubMed Central

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Parallel Logic Programming and Parallel Systems Software and Hardware

DTIC Science & Technology

1989-07-29

Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
The force on the flex: Global parallelism and portability

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1986-01-01

A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Cellular automata with object-oriented features for parallel molecular network modeling.

PubMed

Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan

2005-06-01

Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented.
Efficient Thread Labeling for Monitoring Programs with Nested Parallelism

NASA Astrophysics Data System (ADS)

Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee

It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.

User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Parallel Rendering of Large Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Garbutt, Alexander E.

2005-01-01

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Solving Partial Differential Equations in a data-driven multiprocessor environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.

1988-12-31

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
A design concept of parallel elasticity extracted from biological muscles for engineered actuators.

PubMed

Chen, Jie; Jin, Hongzhe; Iida, Fumiya; Zhao, Jie

2016-08-23

Series elastic actuation that takes inspiration from biological muscle-tendon units has been extensively studied and used to address the challenges (e.g. energy efficiency, robustness) existing in purely stiff robots. However, there also exists another form of passive property in biological actuation, parallel elasticity within muscles themselves, and our knowledge of it is limited: for example, there is still no general design strategy for the elasticity profile. When we look at nature, on the other hand, there seems a universal agreement in biological systems: experimental evidence has suggested that a concave-upward elasticity behaviour is exhibited within the muscles of animals. Seeking to draw possible design clues for elasticity in parallel with actuators, we use a simplified joint model to investigate the mechanisms behind this biologically universal preference of muscles. Actuation of the model is identified from general biological joints and further reduced with a specific focus on muscle elasticity aspects, for the sake of easy implementation. By examining various elasticity scenarios, one without elasticity and three with elasticity of different profiles, we find that parallel elasticity generally exerts contradictory influences on energy efficiency and disturbance rejection, due to the mechanical impedance shift thus caused. The trade-off analysis between them also reveals that concave parallel elasticity is able to achieve a more advantageous balance than linear and convex ones. It is expected that the results could contribute to our further understanding of muscle elasticity and provide a theoretical guideline on how to properly design parallel elasticity behaviours for engineering systems such as artificial actuators and robotic joints.
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

PubMed

Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

2016-01-01

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Visual analysis of inter-process communication for large-scale parallel computing.

PubMed

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
A third-generation density-functional-theory-based method for calculating canonical molecular orbitals of large molecules.

PubMed

Hirano, Toshiyuki; Sato, Fumitoshi

2014-07-28

We used grid-free modified Cholesky decomposition (CD) to develop a density-functional-theory (DFT)-based method for calculating the canonical molecular orbitals (CMOs) of large molecules. Our method can be used to calculate standard CMOs, analytically compute exchange-correlation terms, and maximise the capacity of next-generation supercomputers. Cholesky vectors were first analytically downscaled using low-rank pivoted CD and CD with adaptive metric (CDAM). The obtained Cholesky vectors were distributed and stored on each computer node in a parallel computer, and the Coulomb, Fock exchange, and pure exchange-correlation terms were calculated by multiplying the Cholesky vectors without evaluating molecular integrals in self-consistent field iterations. Our method enables DFT and massively distributed memory parallel computers to be used in order to very efficiently calculate the CMOs of large molecules.
Hybrid Parallel Contour Trees, Version 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sewell, Christopher; Fasel, Patricia; Carr, Hamish

A common operation in scientific visualization is to compute and render a contour of a data set. Given a function of the form f : R^d -> R, a level set is defined as an inverse image f^-1(h) for an isovalue h, and a contour is a single connected component of a level set. The Reeb graph can then be defined to be the result of contracting each contour to a single point, and is well defined for Euclidean spaces or for general manifolds. For simple domains, the graph is guaranteed to be a tree, and is called the contourmore » tree. Analysis can then be performed on the contour tree in order to identify isovalues of particular interest, based on various metrics, and render the corresponding contours, without having to know such isovalues a priori. This code is intended to be the first data-parallel algorithm for computing contour trees. Our implementation will use the portable data-parallel primitives provided by Nvidia’s Thrust library, allowing us to compile our same code for both GPUs and multi-core CPUs. Native OpenMP and purely serial versions of the code will likely also be included. It will also be extended to provide a hybrid data-parallel / distributed algorithm, allowing scaling beyond a single GPU or CPU.« less
Parallel machine architecture and compiler design facilities

NASA Technical Reports Server (NTRS)

Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

1990-01-01

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

1999-01-01

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
DOE SBIR Phase-1 Report on Hybrid CPU-GPU Parallel Development of the Eulerian-Lagrangian Barracuda Multiphase Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dr. Dale M. Snider

2011-02-28

This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendlymore » environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and jointly parallelize subroutines (CPFD chose the small business EMPhotonics for the Phase-1 the technical partner. See Section Technical Objective and Approach) Task 3: Integrate parallel subroutines into Barracuda (See Section Results from Phase-1 and its subsections) Task 4: Testing, refinement, and optimization of parallel methodology (See Section Results from Phase-1 and Section Result Comparison Program) Task 5: Integrate Phase-1 parallel subroutines into Barracuda and release (See Section Results from Phase-1 and its subsections) Task 6: Roadmap of Phase-2 (See Section Plan for Phase-2) With the completion of Phase 1 we have the base understanding to completely parallelize Barracuda. An overview of the work to move Barracuda to a parallelized code is given in Plan for Phase-2.« less
Pure quasi-P wave equation and numerical solution in 3D TTI media

NASA Astrophysics Data System (ADS)

Zhang, Jian-Min; He, Bing-Shou; Tang, Huai-Gu

2017-03-01

Based on the pure quasi-P wave equation in transverse isotropic media with a vertical symmetry axis (VTI media), a quasi-P wave equation is obtained in transverse isotropic media with a tilted symmetry axis (TTI media). This is achieved using projection transformation, which rotates the direction vector in the coordinate system of observation toward the direction vector for the coordinate system in which the z-component is parallel to the symmetry axis of the TTI media. The equation has a simple form, is easily calculated, is not influenced by the pseudo-shear wave, and can be calculated reliably when δ is greater than ɛ. The finite difference method is used to solve the equation. In addition, a perfectly matched layer (PML) absorbing boundary condition is obtained for the equation. Theoretical analysis and numerical simulation results with forward modeling prove that the equation can accurately simulate a quasi-P wave in TTI medium.
Electrolytic Migration of Ag-Pd Alloy Wires with Various Pd Contents

NASA Astrophysics Data System (ADS)

Lin, Yan-Cheng; Chen, Chun-Hao; He, Yu-Zhen; Chen, Sheng-Chi; Chuang, Tung-Han

2018-07-01

During Ag ion migration in an aqueous water drop covering a pair of parallel Ag-Pd wires under current stressing, hydrogen bubbles form first from the cathode, followed by the appearance of pure Ag dendrites on the cathodic wire. In this study, Ag dendrites with a diameter of 0.2-0.4 μm grew toward the anodic wire. The growth rate ( v) of these dendrites decreased with the Pd content ( c) with a linear relationship of: v = 10.02 - 0.43 c . Accompanying the growth of pure Ag dendrites was the formation of a continuous layer of crystallographic Ag2O particles on the surface of the anodic wire. The deposition of such insulating Ag2O products did not prevent the contact of Ag dendrites with the anodic Ag-Pd wire or the short circuit of the wire couple.
Developmental and individual differences in pure numerical estimation.

PubMed

Booth, Julie L; Siegler, Robert S

2006-01-01

The authors examined developmental and individual differences in pure numerical estimation, the type of estimation that depends solely on knowledge of numbers. Children between kindergarten and 4th grade were asked to solve 4 types of numerical estimation problems: computational, numerosity, measurement, and number line. In Experiment 1, kindergartners and 1st, 2nd, and 3rd graders were presented problems involving the numbers 0-100; in Experiment 2, 2nd and 4th graders were presented problems involving the numbers 0-1,000. Parallel developmental trends, involving increasing reliance on linear representations of numbers and decreasing reliance on logarithmic ones, emerged across different types of estimation. Consistent individual differences across tasks were also apparent, and all types of estimation skill were positively related to math achievement test scores. Implications for understanding of mathematics learning in general are discussed. Copyright 2006 APA, all rights reserved.
Electrolytic Migration of Ag-Pd Alloy Wires with Various Pd Contents

NASA Astrophysics Data System (ADS)

Lin, Yan-Cheng; Chen, Chun-Hao; He, Yu-Zhen; Chen, Sheng-Chi; Chuang, Tung-Han

2018-03-01

During Ag ion migration in an aqueous water drop covering a pair of parallel Ag-Pd wires under current stressing, hydrogen bubbles form first from the cathode, followed by the appearance of pure Ag dendrites on the cathodic wire. In this study, Ag dendrites with a diameter of 0.2-0.4 μm grew toward the anodic wire. The growth rate (v) of these dendrites decreased with the Pd content (c) with a linear relationship of: v = 10.02 - 0.43 c . Accompanying the growth of pure Ag dendrites was the formation of a continuous layer of crystallographic Ag2O particles on the surface of the anodic wire. The deposition of such insulating Ag2O products did not prevent the contact of Ag dendrites with the anodic Ag-Pd wire or the short circuit of the wire couple.
Economical launching and accelerating control strategy for a single-shaft parallel hybrid electric bus

NASA Astrophysics Data System (ADS)

Yang, Chao; Song, Jian; Li, Liang; Li, Shengbo; Cao, Dongpu

2016-08-01

This paper presents an economical launching and accelerating mode, including four ordered phases: pure electrical driving, clutch engagement and engine start-up, engine active charging, and engine driving, which can be fit for the alternating conditions and improve the fuel economy of hybrid electric bus (HEB) during typical city-bus driving scenarios. By utilizing the fast response feature of electric motor (EM), an adaptive controller for EM is designed to realize the power demand during the pure electrical driving mode, the engine starting mode and the engine active charging mode. Concurrently, the smoothness issue induced by the sequential mode transitions is solved with a coordinated control logic for engine, EM and clutch. Simulation and experimental results show that the proposed launching and accelerating mode and its control methods are effective in improving the fuel economy and ensure the drivability during the fast transition between the operation modes of HEB.
Conformal superalgebras via tractor calculus

NASA Astrophysics Data System (ADS)

Lischewski, Andree

2015-01-01

We use the manifestly conformally invariant description of a Lorentzian conformal structure in terms of a parabolic Cartan geometry in order to introduce a superalgebra structure on the space of twistor spinors and normal conformal vector fields formulated in purely algebraic terms on parallel sections in tractor bundles. Via a fixed metric in the conformal class, one reproduces a conformal superalgebra structure that has been considered in the literature before. The tractor approach, however, makes clear that the failure of this object to be a Lie superalgebra in certain cases is due to purely algebraic identities on the spinor module and to special properties of the conformal holonomy representation. Moreover, it naturally generalizes to higher signatures. This yields new formulas for constructing new twistor spinors and higher order normal conformal Killing forms out of existing ones, generalizing the well-known spinorial Lie derivative. Moreover, we derive restrictions on the possible dimension of the space of twistor spinors in any metric signature.
Monitoring Data-Structure Evolution in Distributed Message-Passing Programs

NASA Technical Reports Server (NTRS)

Sarukkai, Sekhar R.; Beers, Andrew; Woodrow, Thomas S. (Technical Monitor)

1996-01-01

Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. In this paper we present a methodology that automatically tracks memory bindings (not the actual contents) of static and dynamic data-structures of message-passing C programs, using PVM. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging the semantics of program execution. Scalable animations of virtual address bindings of source-level data-structures are used for debugging the semantics of parallel programs across all processors. In conjunction with light-weight core-files, this technique can be used to complement traditional debuggers on single processors. Detailed information (such as data-structure contents), on specific nodes, can be determined using traditional debuggers after the data structure evolution leading to the semantic error is observed graphically.
Synthesis of Enantiomerically Pure 6-Substituted-Piperazine-2-Acetic Acid Esters as Intermediates for Library Production.

PubMed

Chamakuri, Srinivas; Jain, Prashi; Guduru, Shiva Krishna Reddy; Arney, Joseph Winston; MacKenzie, Kevin; Santini, Conrad; Young, Damian W

2018-05-11

Amino acids from the chiral pool have been used to produce a 24-member branch of 2,6-disubstituted piperazine scaffolds suitable for use in compound library production. Each scaffold was obtained as a single absolute stereoisomer in multi-gram quantities. Stereochemistry was confirmed by 2D NMR protocols and enantiomeric purity was determined by chiral HPLC. The scaffolds are intended for use as intermediates in parallel synthesis of small-molecule libraries.

[Pain and opioid dependency as multilevel network phenomenon : Theoretical and metatheoretical aspects].

PubMed

Tretter, F

2016-08-01

Methodological reflections on pain research and pain therapy focussing on addiction risks are addressed in this article. Starting from the incompleteness of objectification of the purely subjectively fully understandable phenomena of pain and addiction, the relevance of a comprehensive general psychology is underlined. It is shown that that reduction of pain and addiction to a mainly focally arguing neurobiology is only possible if both disciplines have a systemic concept of pain and addiction. With this aim, parallelized conceptual network models are presented.
Soviet Research in Production and Physical Metallurgy of Pure Metals

DTIC Science & Technology

1964-01-10

theeby the level of internal friction. Conclusions 1. A methodology was developed for growing nP27bdemn slag crystals from the gaseous phae using the...case of zinc and cadmium the base may be situated perpendicularly to the axis of the specimen, i.e., parallel to the crystallization front. The same...separately, the latter being soldered to the ring with copper- zinc solder. With the modulator in a position as shown in Figure 2, the geometrical center
Command/response protocols and concurrent software

NASA Technical Reports Server (NTRS)

Bynum, W. L.

1987-01-01

A version of the program to control the parallel jaw gripper is documented. The parallel jaw end-effector hardware and the Intel 8031 processor that is used to control the end-effector are briefly described. A general overview of the controller program is given and a complete description of the program's structure and design are contained. There are three appendices: a memory map of the on-chip RAM, a cross-reference listing of the self-scheduling routines, and a summary of the top-level and monitor commands.
Computer programs for adjusting the mechanical properties of 2-inch dimension lumber for changes in moisture content

Treesearch

James W. Evans; Jane K. Evans; David W. Green

1990-01-01

This paper presents computer programs for adjusting the mechanical properties of 2-in. dimension lumber for changes in moisture content. Mechanical properties adjusted are modulus of rupture, ultimate tensile stress parallel to the grain, ultimate compressive stress parallel to the gain, and flexural modulus of elasticity. The models are valid for moisture contents...
Selective laser melting of high-performance pure tungsten: parameter design, densification behavior and mechanical properties

PubMed Central

Zhou, Kesong; Ma, Wenyou; Attard, Bonnie; Zhang, Panpan; Kuang, Tongchun

2018-01-01

Abstract Selective laser melting (SLM) additive manufacturing of pure tungsten encounters nearly all intractable difficulties of SLM metals fields due to its intrinsic properties. The key factors, including powder characteristics, layer thickness, and laser parameters of SLM high density tungsten are elucidated and discussed in detail. The main parameters were designed from theoretical calculations prior to the SLM process and experimentally optimized. Pure tungsten products with a density of 19.01 g/cm3 (98.50% theoretical density) were produced using SLM with the optimized processing parameters. A high density microstructure is formed without significant balling or macrocracks. The formation mechanisms for pores and the densification behaviors are systematically elucidated. Electron backscattered diffraction analysis confirms that the columnar grains stretch across several layers and parallel to the maximum temperature gradient, which can ensure good bonding between the layers. The mechanical properties of the SLM-produced tungsten are comparable to that produced by the conventional fabrication methods, with hardness values exceeding 460 HV0.05 and an ultimate compressive strength of about 1 GPa. This finding offers new potential applications of refractory metals in additive manufacturing. PMID:29707073
Flexible and unique representations of two-digit decimals.

PubMed

Zhang, Li; Chen, Min; Lin, Chongde; Szűcs, Denes

2014-09-01

We examined the representation of two-digit decimals through studying distance and compatibility effects in magnitude comparison tasks in four experiments. Using number pairs with different leftmost digits, we found both the second digit distance effect and compatibility effect with two-digit integers but only the second digit distance effect with two-digit pure decimals. This suggests that both integers and pure decimals are processed in a compositional manner. In contrast, neither the second digit distance effect nor the compatibility effect was observed in two-digit mixed decimals, thereby showing no evidence for compositional processing of two-digit mixed decimals. However, when the relevance of the rightmost digit processing was increased by adding some decimals pairs with the same leftmost digits, both pure and mixed decimals produced the compatibility effect. Overall, results suggest that the processing of decimals is flexible and depends on the relevance of unique digit positions. This processing mode is different from integer analysis in that two-digit mixed decimals demonstrate parallel compositional processing only when the rightmost digit is relevant. Findings suggest that people probably do not represent decimals by simply ignoring the decimal point and converting them to natural numbers. Copyright © 2014 Elsevier B.V. All rights reserved.
Selective laser melting of high-performance pure tungsten: parameter design, densification behavior and mechanical properties.

PubMed

Tan, Chaolin; Zhou, Kesong; Ma, Wenyou; Attard, Bonnie; Zhang, Panpan; Kuang, Tongchun

2018-01-01

Selective laser melting (SLM) additive manufacturing of pure tungsten encounters nearly all intractable difficulties of SLM metals fields due to its intrinsic properties. The key factors, including powder characteristics, layer thickness, and laser parameters of SLM high density tungsten are elucidated and discussed in detail. The main parameters were designed from theoretical calculations prior to the SLM process and experimentally optimized. Pure tungsten products with a density of 19.01 g/cm 3 (98.50% theoretical density) were produced using SLM with the optimized processing parameters. A high density microstructure is formed without significant balling or macrocracks. The formation mechanisms for pores and the densification behaviors are systematically elucidated. Electron backscattered diffraction analysis confirms that the columnar grains stretch across several layers and parallel to the maximum temperature gradient, which can ensure good bonding between the layers. The mechanical properties of the SLM-produced tungsten are comparable to that produced by the conventional fabrication methods, with hardness values exceeding 460 HV 0.05 and an ultimate compressive strength of about 1 GPa. This finding offers new potential applications of refractory metals in additive manufacturing.
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations

NASA Astrophysics Data System (ADS)

Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.

2010-09-01

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.
Prevalence of c-KIT Mutations in Gonadoblastoma and Dysgerminomas of Patients with Disorders of Sex Development (DSD) and Ovarian Dysgerminomas

PubMed Central

Hersmus, Remko; Stoop, Hans; van de Geijn, Gert Jan; Eini, Ronak; Biermann, Katharina; Oosterhuis, J. Wolter; DHooge, Catharina; Schneider, Dominik T.; Meijssen, Isabelle C.; Dinjens, Winand N. M.; Dubbink, Hendrikus Jan; Drop, Stenvert L. S.; Looijenga, Leendert H. J.

2012-01-01

Activating c-KIT mutations (exons 11 and 17) are found in 10–40% of testicular seminomas, the majority being missense point mutations (codon 816). Malignant ovarian dysgerminomas represent ∼3% of all ovarian cancers in Western countries, resembling testicular seminomas, regarding chromosomal aberrations and c-KIT mutations. DSD patients with specific Y-sequences have an increased risk for Type II Germ Cell Tumor/Cancer, with gonadoblastoma as precursor progressing to dysgerminoma. Here we present analysis of c-KIT exon 8, 9, 11, 13 and 17, and PDGFRA exon 12, 14 and 18 by conventional sequencing together with mutational analysis of c-KIT codon 816 by a sensitive and specific LightCycler melting curve analysis, confirmed by sequencing. The results are combined with data on TSPY and OCT3/4 expression in a series of 16 DSD patients presenting with gonadoblastoma and dysgerminoma and 15 patients presenting pure ovarian dysgerminomas without DSD. c-KIT codon 816 mutations were detected in five out of the total of 31 cases (all found in pure ovarian dysgerminomas). A synonymous SNP (rs 5578615) was detected in two patients, one DSD patient (with bilateral disease) and one patient with dysgerminoma. Next to these, three codon N822K mutations were detected in the group of 15 pure ovarian dysgerminomas. In total activating c-KIT mutations were found in 53% of ovarian dysgerminomas without DSD. In the group of 16 DSD cases a N505I and D820E mutation was found in a single tumor of a patient with gonadoblastoma and dysgerminoma. No PDGFRA mutations were found. Positive OCT3/4 staining was present in all gonadoblastomas and dysgerminomas investigated, TSPY expression was only seen in the gonadoblastoma/dysgerminoma lesions of the 16 DSD patients. This data supports the existence of two distinct but parallel pathways in the development of dysgerminoma, in which mutational status of c-KIT might parallel the presence of TSPY. PMID:22937135
Investigation of Electrodeposited Alloys and Pure Metals as Substitutes for Zinc and Cadmium for Protective Finishes for Steel Parts of Aircraft

DTIC Science & Technology

1949-09-01

ON LOAN FROM 7k a. **+dU fefeÄtüiÄ: .<*-#=« Investigation of Electrodeposited Alloys and Pure Metals as Substitutes for Zinc and Cadmium for...graphs Eight alloys, selected as being superior to pure zinc or cadmium for protecting steel, were evaluated on the basis of static and dynamic... zinc -silver alloy of 25% silver. A tabulated summary of the testing program on all cast and electrodeposited alloys tested is included. * and
Petascale Simulation Initiative Tech Base: FY2007 Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

May, J; Chen, R; Jefferson, D

The Petascale Simulation Initiative began as an LDRD project in the middle of Fiscal Year 2004. The goal of the project was to develop techniques to allow large-scale scientific simulation applications to better exploit the massive parallelism that will come with computers running at petaflops per second. One of the major products of this work was the design and prototype implementation of a programming model and a runtime system that lets applications extend data-parallel applications to use task parallelism. By adopting task parallelism, applications can use processing resources more flexibly, exploit multiple forms of parallelism, and support more sophisticated multiscalemore » and multiphysics models. Our programming model was originally called the Symponents Architecture but is now known as Cooperative Parallelism, and the runtime software that supports it is called Coop. (However, we sometimes refer to the programming model as Coop for brevity.) We have documented the programming model and runtime system in a submitted conference paper [1]. This report focuses on the specific accomplishments of the Cooperative Parallelism project (as we now call it) under Tech Base funding in FY2007. Development and implementation of the model under LDRD funding alone proceeded to the point of demonstrating a large-scale materials modeling application using Coop on more than 1300 processors by the end of FY2006. Beginning in FY2007, the project received funding from both LDRD and the Computation Directorate Tech Base program. Later in the year, after the three-year term of the LDRD funding ended, the ASC program supported the project with additional funds. The goal of the Tech Base effort was to bring Coop from a prototype to a production-ready system that a variety of LLNL users could work with. Specifically, the major tasks that we planned for the project were: (1) Port SARS [former name of the Coop runtime system] to another LLNL platform, probably Thunder or Peloton (depending on when Peloton becomes available); (2) Improve SARS's robustness and ease-of-use, and develop user documentation; and (3) Work with LLNL code teams to help them determine how Symponents could benefit their applications. The original funding request was $296,000 for the year, and we eventually received $252,000. The remainder of this report describes our efforts and accomplishments for each of the goals listed above.« less
BioSmalltalk: a pure object system and library for bioinformatics.

PubMed

Morales, Hernán F; Giovambattista, Guillermo

2013-09-15

We have developed BioSmalltalk, a new environment system for pure object-oriented bioinformatics programming. Adaptive end-user programming systems tend to become more important for discovering biological knowledge, as is demonstrated by the emergence of open-source programming toolkits for bioinformatics in the past years. Our software is intended to bridge the gap between bioscientists and rapid software prototyping while preserving the possibility of scaling to whole-system biology applications. BioSmalltalk performs better in terms of execution time and memory usage than Biopython and BioPerl for some classical situations. BioSmalltalk is cross-platform and freely available (MIT license) through the Google Project Hosting at http://code.google.com/p/biosmalltalk hernan.morales@gmail.com Supplementary data are available at Bioinformatics online.
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies

PubMed Central

Ma, Li; Runesha, H Birali; Dvorkin, Daniel; Garbe, John R; Da, Yang

2008-01-01

Background Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS. Results The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements. Conclusion The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware. PMID:18644146
The language parallel Pascal and other aspects of the massively parallel processor

NASA Technical Reports Server (NTRS)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
IESIP - AN IMPROVED EXPLORATORY SEARCH TECHNIQUE FOR PURE INTEGER LINEAR PROGRAMMING PROBLEMS

NASA Technical Reports Server (NTRS)

Fogle, F. R.

1994-01-01

IESIP, an Improved Exploratory Search Technique for Pure Integer Linear Programming Problems, addresses the problem of optimizing an objective function of one or more variables subject to a set of confining functions or constraints by a method called discrete optimization or integer programming. Integer programming is based on a specific form of the general linear programming problem in which all variables in the objective function and all variables in the constraints are integers. While more difficult, integer programming is required for accuracy when modeling systems with small numbers of components such as the distribution of goods, machine scheduling, and production scheduling. IESIP establishes a new methodology for solving pure integer programming problems by utilizing a modified version of the univariate exploratory move developed by Robert Hooke and T.A. Jeeves. IESIP also takes some of its technique from the greedy procedure and the idea of unit neighborhoods. A rounding scheme uses the continuous solution found by traditional methods (simplex or other suitable technique) and creates a feasible integer starting point. The Hook and Jeeves exploratory search is modified to accommodate integers and constraints and is then employed to determine an optimal integer solution from the feasible starting solution. The user-friendly IESIP allows for rapid solution of problems up to 10 variables in size (limited by DOS allocation). Sample problems compare IESIP solutions with the traditional branch-and-bound approach. IESIP is written in Borland's TURBO Pascal for IBM PC series computers and compatibles running DOS. Source code and an executable are provided. The main memory requirement for execution is 25K. This program is available on a 5.25 inch 360K MS DOS format diskette. IESIP was developed in 1990. IBM is a trademark of International Business Machines. TURBO Pascal is registered by Borland International.
Automatic recognition of vector and parallel operations in a higher level language

NASA Technical Reports Server (NTRS)

Schneck, P. B.

1971-01-01

A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.

Understanding and Improving High-Performance I/O Subsystems

NASA Technical Reports Server (NTRS)

El-Ghazawi, Tarek A.; Frieder, Gideon; Clark, A. James

1996-01-01

This research program has been conducted in the framework of the NASA Earth and Space Science (ESS) evaluations led by Dr. Thomas Sterling. In addition to the many important research findings for NASA and the prestigious publications, the program has helped orienting the doctoral research program of two students towards parallel input/output in high-performance computing. Further, the experimental results in the case of the MasPar were very useful and helpful to MasPar with which the P.I. has had many interactions with the technical management. The contributions of this program are drawn from three experimental studies conducted on different high-performance computing testbeds/platforms, and therefore presented in 3 different segments as follows: 1. Evaluating the parallel input/output subsystem of a NASA high-performance computing testbeds, namely the MasPar MP- 1 and MP-2; 2. Characterizing the physical input/output request patterns for NASA ESS applications, which used the Beowulf platform; and 3. Dynamic scheduling techniques for hiding I/O latency in parallel applications such as sparse matrix computations. This study also has been conducted on the Intel Paragon and has also provided an experimental evaluation for the Parallel File System (PFS) and parallel input/output on the Paragon. This report is organized as follows. The summary of findings discusses the results of each of the aforementioned 3 studies. Three appendices, each containing a key scholarly research paper that details the work in one of the studies are included.
Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1996-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Experimental Aerodynamic Derivatives of a Sinusoidally Oscillating Airfoil in Two-Dimensional Flow

NASA Technical Reports Server (NTRS)

Halfman, Robert L

1952-01-01

Experimental measurements of the aerodynamic reactions on a symmetrical airfoil oscillating harmonically in a two-dimensional flow are presented and analyzed. Harmonic motions include pure pitch and pure translation, for several amplitudes and superimposed on an initial angle of attack, as well as combined pitch and translation. The apparatus and testing program are described briefly and the necessary theoretical background is presented. In general, the experimental results agree remarkably well with the theory, especially in the case of the pure motions. The net work per cycle for a motion corresponding to flutter is experimentally determined to be zero. Considerable consistent data for pure pitch were obtained from a search of available reference material, and several definite Reynolds number effects are evident.
Real-Time MENTAT programming language and architecture

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.

1989-01-01

Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

NASA Technical Reports Server (NTRS)

Weed, Richard Allen; Sankar, L. N.

1994-01-01

An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
Communication library for run-time visualization of distributed, asynchronous data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rowlan, J.; Wightman, B.T.

1994-04-01

In this paper we present a method for collecting and visualizing data generated by a parallel computational simulation during run time. Data distributed across multiple processes is sent across parallel communication lines to a remote workstation, which sorts and queues the data for visualization. We have implemented our method in a set of tools called PORTAL (for Parallel aRchitecture data-TrAnsfer Library). The tools comprise generic routines for sending data from a parallel program (callable from either C or FORTRAN), a semi-parallel communication scheme currently built upon Unix Sockets, and a real-time connection to the scientific visualization program AVS. Our methodmore » is most valuable when used to examine large datasets that can be efficiently generated and do not need to be stored on disk. The PORTAL source libraries, detailed documentation, and a working example can be obtained by anonymous ftp from info.mcs.anl.gov from the file portal.tar.Z from the directory pub/portal.« less
Multiprogramming performance degradation - Case study on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Dimpsey, R. T.; Iyer, R. K.

1989-01-01

The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

PubMed

O'Connor, B P

2000-08-01

Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.
Panel discussion: prescribed burning in the 21st century

Treesearch

Jerry Hurley; Ishmael Messer; Stephen J. Botti; Jay Perkins; L. Dean Clark

1995-01-01

Even though many legal, social, and organizational constraints affect prescribed fire programs, the ecological and social benefits of such programs encourage their continued existence (with or without modification). The form of these programs in the next 10 to 50 years is pure speculation; but we must speculate and project the programs, as well as associated benefits...
TV And Your Child; In Search of an Answer.

ERIC Educational Resources Information Center

Pannitt, Merrill, Ed.

Like all television programing, programs for children are aimed to produce profit. Since cartoon shows are inexpensive, they are staples of children's television. These programs can offer sponsors a pure, undifferentiated audience at which to aim commercials for toys and breakfast cereals. In addition to cartoon shows, children watch "Sesame…
As-built design specification for PARCLS

NASA Technical Reports Server (NTRS)

Tompkins, M. A. (Principal Investigator)

1981-01-01

The PARCLS program, part of the CLASFYG package, reads a parameter file created by the CLASFYG program and a pure pixel ground truth file in order to create to classification file of three separate crop categories in universal format.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

NASA Technical Reports Server (NTRS)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
The revised solar array synthesis computer program

NASA Technical Reports Server (NTRS)

1970-01-01

The Revised Solar Array Synthesis Computer Program is described. It is a general-purpose program which computes solar array output characteristics while accounting for the effects of temperature, incidence angle, charged-particle irradiation, and other degradation effects on various solar array configurations in either circular or elliptical orbits. Array configurations may consist of up to 75 solar cell panels arranged in any series-parallel combination not exceeding three series-connected panels in a parallel string and no more than 25 parallel strings in an array. Up to 100 separate solar array current-voltage characteristics, corresponding to 100 equal-time increments during the sunlight illuminated portion of an orbit or any 100 user-specified combinations of incidence angle and temperature, can be computed and printed out during one complete computer execution. Individual panel incidence angles may be computed and printed out at the user's option.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Mn 0.9Co 0.1P in field parallel to hard direction: phase diagram and irreversibility of CONE phase

NASA Astrophysics Data System (ADS)

Zieba, A.; Becerra, C. C.; Oliveira, N. F.; Fjellvåg, H.; Kjekshus, A.

1992-02-01

A single crystal of Mn 0.9Co 0.1P, a homologue of MnP with disordered metal sublattice, has been studied by the ac susceptibility method in a steady field H. This report concerns H parallel to the orthorhombic a axis ( a> b> c). The magnetic phase diagram is qualitatively similar to that of MnP, including the presence of a Lifshitz multicritical point ( TL = 98 K, HL = 42 kOe) at the confluence of the paramagnetic, ferromagnetic and modulated FAN phases. Contrary to pure MnP, irreversible behaviour was observed in the susceptibility of the modulated CONE phase. This phenomenon develops only for fields above 30 kOe, in contrast to the irreversibility of the FAN phase (reported previously for H‖ b in the whole field range down to H = 0). New features of the presumably continuous CONE-FAN transition were also found.
Parallel processing by cortical inhibition enables context-dependent behavior.

PubMed

Kuchibhotla, Kishore V; Gill, Jonathan V; Lindsay, Grace W; Papadoyannis, Eleni S; Field, Rachel E; Sten, Tom A Hindmarsh; Miller, Kenneth D; Froemke, Robert C

2017-01-01

Physical features of sensory stimuli are fixed, but sensory perception is context dependent. The precise mechanisms that govern contextual modulation remain unknown. Here, we trained mice to switch between two contexts: passively listening to pure tones and performing a recognition task for the same stimuli. Two-photon imaging showed that many excitatory neurons in auditory cortex were suppressed during behavior, while some cells became more active. Whole-cell recordings showed that excitatory inputs were affected only modestly by context, but inhibition was more sensitive, with PV + , SOM + , and VIP + interneurons balancing inhibition and disinhibition within the network. Cholinergic modulation was involved in context switching, with cholinergic axons increasing activity during behavior and directly depolarizing inhibitory cells. Network modeling captured these findings, but only when modulation coincidently drove all three interneuron subtypes, ruling out either inhibition or disinhibition alone as sole mechanism for active engagement. Parallel processing of cholinergic modulation by cortical interneurons therefore enables context-dependent behavior.
Orientation dependence of the dislocation microstructure in compressed body-centered cubic molybdenum

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, S.; Wang, M.P.; Chen, C., E-mail: chench011-33@163.com

2014-05-01

The orientation dependence of the deformation microstructure has been investigated in commercial pure molybdenum. After deformation, the dislocation boundaries of compressed molybdenum can be classified, similar to that in face-centered cubic metals, into three types: dislocation cells (Type 2), and extended planar boundaries parallel to (Type 1) or not parallel to (Type 3) a (110) trace. However, it shows a reciprocal relationship between face-centered cubic metals and body-centered cubic metals on the orientation dependence of the deformation microstructure. The higher the strain, the finer the microstructure is and the smaller the inclination angle between extended planar boundaries and the compressionmore » axis is. - Highlights: • A reciprocal relationship between FCC metals and BCC metals is confirmed. • The dislocation boundaries can be classified into three types in compressed Mo. • The dislocation characteristic of different dislocation boundaries is different.« less
Parallel computation and the Basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1992-12-16

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to-use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communication costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis and Parallelmore » Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less

Orthorectification by Using Gpgpu Method

NASA Astrophysics Data System (ADS)

Sahin, H.; Kulur, S.

2012-07-01

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

PubMed

Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

2016-08-05

The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Automated Performance Prediction of Message-Passing Parallel Programs

NASA Technical Reports Server (NTRS)

Block, Robert J.; Sarukkai, Sekhar; Mehra, Pankaj; Woodrow, Thomas S. (Technical Monitor)

1995-01-01

The increasing use of massively parallel supercomputers to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. Much work has been done to create simple models that represent important characteristics of parallel programs, such as latency, network contention, and communication volume. But many of these methods still require substantial manual effort to represent an application in the model's format. The NIK toolkit described in this paper is the result of an on-going effort to automate the formation of analytic expressions of program execution time, with a minimum of programmer assistance. In this paper we demonstrate the feasibility of our approach, by extending previous work to detect and model communication patterns automatically, with and without overlapped computations. The predictions derived from these models agree, within reasonable limits, with execution times of programs measured on the Intel iPSC/860 and Paragon. Further, we demonstrate the use of MK in selecting optimal computational grain size and studying various scalability metrics.
Connectionist Models and Parallelism in High Level Vision.

DTIC Science & Technology

1985-01-01

GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These
Transient Finite Element Computations on a Variable Transputer System

NASA Technical Reports Server (NTRS)

Smolinski, Patrick J.; Lapczyk, Ireneusz

1993-01-01

A parallel program to analyze transient finite element problems was written and implemented on a system of transputer processors. The program uses the explicit time integration algorithm which eliminates the need for equation solving, making it more suitable for parallel computations. An interprocessor communication scheme was developed for arbitrary two dimensional grid processor configurations. Several 3-D problems were analyzed on a system with a small number of processors.
Scheduling for Locality in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
An Empirical Development of Parallelization Guidelines for Time-Driven Simulation

DTIC Science & Technology

1989-12-01

wives, who though not Cub fans, put on a good show during our trip, to waich some games . I would also like to recognize the help of my professors at...program parallelization. in this research effort a Ballistic Missile Defense (BMD) time driven simulation program, developed by DESE Research and...continuously, or continuously with discrete changes superimposed. The distinguishing feature of these simulations is the interaction between discretely
Force user's manual, revised

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

1987-01-01

A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.
Parallel Programming Paradigms

DTIC Science & Technology

1987-07-01

Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we
Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

PubMed

Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

2013-12-01

Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.
A neural-network-based approach to the double traveling salesman problem.

PubMed

Plebe, Alessio; Anile, Angelo Marcello

2002-02-01

The double traveling salesman problem is a variation of the basic traveling salesman problem where targets can be reached by two salespersons operating in parallel. The real problem addressed by this work concerns the optimization of the harvest sequence for the two independent arms of a fruit-harvesting robot. This application poses further constraints, like a collision-avoidance function. The proposed solution is based on a self-organizing map structure, initialized with as many artificial neurons as the number of targets to be reached. One of the key components of the process is the combination of competitive relaxation with a mechanism for deleting and creating artificial neurons. Moreover, in the competitive relaxation process, information about the trajectory connecting the neurons is combined with the distance of neurons from the target. This strategy prevents tangles in the trajectory and collisions between the two tours. Results of tests indicate that the proposed approach is efficient and reliable for harvest sequence planning. Moreover, the enhancements added to the pure self-organizing map concept are of wider importance, as proved by a traveling salesman problem version of the program, simplified from the double version for comparison.
ALPS: A Linear Program Solver

NASA Technical Reports Server (NTRS)

Ferencz, Donald C.; Viterna, Larry A.

1991-01-01

ALPS is a computer program which can be used to solve general linear program (optimization) problems. ALPS was designed for those who have minimal linear programming (LP) knowledge and features a menu-driven scheme to guide the user through the process of creating and solving LP formulations. Once created, the problems can be edited and stored in standard DOS ASCII files to provide portability to various word processors or even other linear programming packages. Unlike many math-oriented LP solvers, ALPS contains an LP parser that reads through the LP formulation and reports several types of errors to the user. ALPS provides a large amount of solution data which is often useful in problem solving. In addition to pure linear programs, ALPS can solve for integer, mixed integer, and binary type problems. Pure linear programs are solved with the revised simplex method. Integer or mixed integer programs are solved initially with the revised simplex, and the completed using the branch-and-bound technique. Binary programs are solved with the method of implicit enumeration. This manual describes how to use ALPS to create, edit, and solve linear programming problems. Instructions for installing ALPS on a PC compatible computer are included in the appendices along with a general introduction to linear programming. A programmers guide is also included for assistance in modifying and maintaining the program.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shipman, Galen M.

These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

NASA Astrophysics Data System (ADS)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
Language and System Support for Concurrent Programming

DTIC Science & Technology

1990-04-01

language. We give suggestions on how to avoid polling programs , and suggest changes to the rendezvous facilities to eliminate the polling bias. The...concerned with support for concurrent pro- Capsule gramming provided to the application programmer by operating Description systems and programming ...of concurrent programming has widened Philosophy from "pure" operating system applications to a multitude of real-time and distributed programs . Since
Software For Integer Programming

NASA Technical Reports Server (NTRS)

Fogle, F. R.

1992-01-01

Improved Exploratory Search Technique for Pure Integer Linear Programming Problems (IESIP) program optimizes objective function of variables subject to confining functions or constraints, using discrete optimization or integer programming. Enables rapid solution of problems up to 10 variables in size. Integer programming required for accuracy in modeling systems containing small number of components, distribution of goods, scheduling operations on machine tools, and scheduling production in general. Written in Borland's TURBO Pascal.
ACHIEVEMENT OF STUDENTS FROM GROUPS INSTRUCTED BY PROGRAMED MATERIALS, CLASSROOM TEACHER, OR BOTH. COMPARATIVE STUDIES OF PRINCIPLES FOR PROGRAMMING MATHEMATICS IN AUTOMATED INSTRUCTION, TECHNICAL REPORT NO. 12.

ERIC Educational Resources Information Center

BROWN, O. ROBERT, JR.

THE EXPERIMENTAL DESIGN IN THIS STUDY OF THE USE OF PROGRAMED MATERIALS TO TEACH HIGH SCHOOL MATHEMATICS DESIGNATED FOUR GROUPS--A CONTROL GROUP TAUGHT CONVENTIONALLY BY TEACHERS TRAINED TO USE PROGRAMED MATERIALS, A "PURE" GROUP USING PROGRAMED MATERIALS ONLY, AND "ANTICIPATING" AND "FOLLOWING" GROUPS THAT USED…
Molecular dynamics studies of transport properties and equation of state of supercritical fluids

NASA Astrophysics Data System (ADS)

Nwobi, Obika C.

Many chemical propulsion systems operate with one or more of the reactants above the critical point in order to enhance their performance. Most of the computational fluid dynamics (CFD) methods used to predict these flows require accurate information on the transport properties and equation of state at these supercritical conditions. This work involves the determination of transport coefficients and equation of state of supercritical fluids by equilibrium molecular dynamics (MD) simulations on parallel computers using the Green-Kubo formulae and the virial equation of state, respectively. MD involves the solution of equations of motion of a system of molecules that interact with each other through an intermolecular potential. Provided that an accurate potential can be found for the system of interest, MD can be used regardless of the phase and thermodynamic conditions of the substances involved. The MD program uses the effective Lennard-Jones potential, with system sizes of 1000-1200 molecules and, simulations of 2,000,000 time-steps for computing transport coefficients and 200,000 time-steps for pressures. The computer code also uses linked cell lists for efficient sorting of molecules, periodic boundary conditions, and a modified velocity Verlet algorithm for particle displacement. Particle decomposition is used for distributing the molecules to different processors of a parallel computer. Simulations have been carried out on pure argon, nitrogen, oxygen and ethylene at various supercritical conditions, with self-diffusion coefficients, shear viscosity coefficients, thermal conductivity coefficients and pressures computed for most of the conditions. Results compare well with experimental and the National Institute of Standards and Technology (NIST) values. The results show that the number of molecules and the potential cut-off radius have no significant effect on the computed coefficients, while long-time integration is necessary for accurate determination of the coefficients.
On the temperature-programmed reduction of Pt-Ir/. gamma. -Al/sub 2/O/sub 3/ catalysts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wagstaff, N.; Prins, R.

1979-10-15

Temperature-programed reduction of a catalyst containing 0.37% Pt and 0.37% Ir on chlorided alumina and treated as previously described for a Pt-Re bimetallic catalyst showed a single reduction peak at 105/sup 0/C, almost exactly at the midpoint between the reduction peaks of the pure platimun and pure iridium peaks treated identically. This peak remained unaltered after fairly severe oxidation treatment (350/sup 0/C). The results indicated that the catalyst formed bimetallic clusters in the reduced state which were more stable than the Pt-Re clusters and did not segregate on oxidation.
Use Computer-Aided Tools to Parallelize Large CFD Applications

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Yan, J.

2000-01-01

Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.

Fabric analysis of quartzites with negative magnetic susceptibility - Does AMS provide information of SPO or CPO of quartz?

NASA Astrophysics Data System (ADS)

Renjith, A. R.; Mamtani, Manish A.; Urai, Janos L.

2016-01-01

We ask the question whether petrofabric data from anisotropy of magnetic susceptibility (AMS) analysis of deformed quartzites gives information about shape preferred orientation (SPO) or crystallographic preferred orientation (CPO) of quartz. Since quartz is diamagnetic and has a negative magnetic susceptibility, 11 samples of nearly pure quartzites with a negative magnetic susceptibility were chosen for this study. After performing AMS analysis, electron backscatter diffraction (EBSD) analysis was done in thin sections prepared parallel to the K1K3 plane of the AMS ellipsoid. Results show that in all the samples quartz SPO is sub-parallel to the orientation of the magnetic foliation. However, in most samples no clear correspondance is observed between quartz CPO and K1 (magnetic lineation) direction. This is contrary to the parallelism observed between K1 direction and orientation of quartz c-axis in the case of undeformed single quartz crystal. Pole figures of quartz indicate that quartz c-axis tends to be parallel to K1 direction only in the case where intracrystalline deformation of quartz is accommodated by prism slip. It is therefore established that AMS investigation of quartz from deformed rocks gives information of SPO. Thus, it is concluded that petrofabric information of quartzite obtained from AMS is a manifestation of its shape anisotropy and not crystallographic preferred orientation.
Mechanical Behavior of Collagen-Fibrin Co-Gels Reflects Transition From Series to Parallel Interactions With Increasing Collagen Content

PubMed Central

Lai, Victor K.; Lake, Spencer P.; Frey, Christina R.; Tranquillo, Robert T.; Barocas, Victor H.

2012-01-01

Fibrin and collagen, biopolymers occurring naturally in the body, are biomaterials commonly-used as scaffolds for tissue engineering. How collagen and ﬁbrin interact to confer macroscopic mechanical properties in collagen-ﬁbrin composite systems remains poorly understood. In this study, we formulated collagen-ﬁbrin co-gels at different collagen-toﬁbrin ratios to observe changes in the overall mechanical behavior and microstructure. A modeling framework of a two-network system was developed by modifying our micro-scale model, considering two forms of interaction between the networks: (a) two interpenetrating but noninteracting networks (“parallel”), and (b) a single network consisting of randomly alternating collagen and ﬁbrin ﬁbrils (“series”). Mechanical testing of our gels show that collagen-ﬁbrin co-gels exhibit intermediate properties (UTS, strain at failure, tangent modulus) compared to those of pure collagen and ﬁbrin. The comparison with model predictions show that the parallel and series model cases provide upper and lower bounds, respectively, for the experimental data, suggesting that a combination of such interactions exists between the collagen and ﬁbrin in co-gels. A transition from the series model to the parallel model occurs with increasing collagen content, with the series model best describing predominantly ﬁbrin co-gels, and the parallel model best describing predominantly collagen co-gels. PMID:22482659
Northeast Artificial Intelligence Consortium Annual Report - 1988 Parallel Vision. Volume 9

DTIC Science & Technology

1989-10-01

supports the Northeast Aritificial Intelligence Consortium (NAIC). Volume 9 Parallel Vision Report submitted by Christopher M. Brown Randal C. Nelson...NORTHEAST ARTIFICIAL INTELLIGENCE CONSORTIUM ANNUAL REPORT - 1988 Parallel Vision Syracuse University Christopher M. Brown and Randal C. Nelson...Technical Director Directorate of Intelligence & Reconnaissance FOR THE COMMANDER: IGOR G. PLONISCH Directorate of Plans & Programs If your address has
High Performance Input/Output for Parallel Computer Systems

NASA Technical Reports Server (NTRS)

Ligon, W. B.

1996-01-01

The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.
The Research of the Parallel Computing Development from the Angle of Cloud Computing

NASA Astrophysics Data System (ADS)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Support of Multidimensional Parallelism in the OpenMP Programming Model

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele

2003-01-01

OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dritz, K.W.; Boyle, J.M.

This paper addresses the problem of measuring and analyzing the performance of fine-grained parallel programs running on shared-memory multiprocessors. Such processors use locking (either directly in the application program, or indirectly in a subroutine library or the operating system) to serialize accesses to global variables. Given sufficiently high rates of locking, the chief factor preventing linear speedup (besides lack of adequate inherent parallelism in the application) is lock contention - the blocking of processes that are trying to acquire a lock currently held by another process. We show how a high-resolution, low-overhead clock may be used to measure both lockmore » contention and lack of parallel work. Several ways of presenting the results are covered, culminating in a method for calculating, in a single multiprocessing run, both the speedup actually achieved and the speedup lost to contention for each lock and to lack of parallel work. The speedup losses are reported in the same units, ''processor-equivalents,'' as the speedup achieved. Both are obtained without having to perform the usual one-process comparison run. We chronicle also a variety of experiments motivated by actual results obtained with our measurement method. The insights into program performance that we gained from these experiments helped us to refine the parts of our programs concerned with communication and synchronization. Ultimately these improvements reduced lock contention to a negligible amount and yielded nearly linear speedup in applications not limited by lack of parallel work. We describe two generally applicable strategies (''code motion out of critical regions'' and ''critical-region fissioning'') for reducing lock contention and one (''lock/variable fusion'') applicable only on certain architectures.« less
ICASE Computer Science Program

NASA Technical Reports Server (NTRS)

1985-01-01

The Institute for Computer Applications in Science and Engineering computer science program is discussed in outline form. Information is given on such topics as problem decomposition, algorithm development, programming languages, and parallel architectures.
Patents, Innovation, and the Welfare Effects of Medicare Part D*

PubMed Central

Gailey, Adam; Lakdawalla, Darius; Sood, Neeraj

2013-01-01

Purpose To evaluate the efficiency consequences of the Medicare Part D program. Methods We develop and empirically calibrate a simple theoretical model to examine the static and dynamic welfare effects of Medicare Part D. Findings We show that Medicare Part D can simultaneously reduce static deadweight loss from monopoly pricing of drugs and improve incentives for innovation. We estimate that even after excluding the insurance value of the program, the welfare gain of Medicare Part D roughly equals its social costs. The program generates $5.11 billion of annual static deadweight loss reduction, and at least $3.0 billion of annual value from extra innovation. Implications Medicare Part D and other public prescription drug programs can be welfare-improving, even for risk-neutral and purely self-interested consumers. Furthermore, negotiation for lower branded drug prices may further increase the social return to the program. Originality This study demonstrates that pure efficiency motives, which do not even surface in the policy debate over Medicare Part D, can nearly justify the program on their own merits. PMID:20575239
Conference Proceedings: Annual Review of Progress in Applied Computational Electromagnetics (ACES󈨢) (10th) Held in Monterey, California on March 21-26, 1994. Volume 1

DTIC Science & Technology

1994-01-01

inborno- geneoui medium, Communications on Pure and Applied Mathematics, XVI, (1963). 363-38]. (8) M. Born and E . Wolf, Principles of Optics...of initiated communications . The final sta• e of the parallalised partitioning technique is the solution of a coupling matrix by the use of a parallel...Frequmeny Asympofic Exposoio for Hypebllcc Equaiomes" by B. Ewupuia. E . PAmni, and S. Odwn 32 ’A New- To’hmapa for Synthesis of OffsK Dud Rtfeca Sysmm
Recursive Algorithms for Real-Time Digital CR-RCn Pulse Shaping

NASA Astrophysics Data System (ADS)

Nakhostin, M.

2011-10-01

This paper reports on recursive algorithms for real-time implementation of CR-(RC)n filters in digital nuclear spectroscopy systems. The algorithms are derived by calculating the Z-transfer function of the filters for filter orders up to n=4 . The performances of the filters are compared with the performance of the conventional digital trapezoidal filter using a noise generator which separately generates pure series, 1/f and parallel noise. The results of our study enable one to select the optimum digital filter for different noise and rate conditions.
JCell--a Java-based framework for inferring regulatory networks from time series data.

PubMed

Spieth, C; Supper, J; Streichert, F; Speer, N; Zell, A

2006-08-15

JCell is a Java-based application for reconstructing gene regulatory networks from experimental data. The framework provides several algorithms to identify genetic and metabolic dependencies based on experimental data conjoint with mathematical models to describe and simulate regulatory systems. Owing to the modular structure, researchers can easily implement new methods. JCell is a pure Java application with additional scripting capabilities and thus widely usable, e.g. on parallel or cluster computers. The software is freely available for download at http://www-ra.informatik.uni-tuebingen.de/software/JCell.
Lattice QCD calculation using VPP500

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Seyong; Ohta, Shigemi

1995-02-01

A new vector parallel supercomputer, Fujitsu VPP500, was installed at RIKEN earlier this year. It consists of 30 vector computers, each with 1.6 GFLOPS peak speed and 256 MB memory, connected by a crossbar switch with 400 MB/s peak data transfer rate each way between any pair of nodes. The authors developed a Fortran lattice QCD simulation code for it. It runs at about 1.1 GFLOPS sustained per node for Metropolis pure-gauge update, and about 0.8 GFLOPS sustained per node for conjugate gradient inversion of staggered fermion matrix.
Cyclotron line resonant transfer through neutron star atmospheres

NASA Technical Reports Server (NTRS)

Wang, John C. L.; Wasserman, Ira M.; Salpeter, Edwin E.

1988-01-01

Monte Carlo methods are used to study in detail the resonant radiative transfer of cyclotron line photons with recoil through a purely scattering neutron star atmosphere for both the polarized and unpolarized cases. For each case, the number of scatters, the path length traveled, the escape frequency shift, the escape direction cosine, the emergent frequency spectra, and the angular distribution of escaping photons are investigated. In the polarized case, transfer is calculated using both the cold plasma e- and o-modes and the magnetic vacuum perpendicular and parallel modes.
The neural basis of parallel saccade programming: an fMRI study.

PubMed

Hu, Yanbo; Walker, Robin

2011-11-01

The neural basis of parallel saccade programming was examined in an event-related fMRI study using a variation of the double-step saccade paradigm. Two double-step conditions were used: one enabled the second saccade to be partially programmed in parallel with the first saccade while in a second condition both saccades had to be prepared serially. The intersaccadic interval, observed in the parallel programming (PP) condition, was significantly reduced compared with latency in the serial programming (SP) condition and also to the latency of single saccades in control conditions. The fMRI analysis revealed greater activity (BOLD response) in the frontal and parietal eye fields for the PP condition compared with the SP double-step condition and when compared with the single-saccade control conditions. By contrast, activity in the supplementary eye fields was greater for the double-step condition than the single-step condition but did not distinguish between the PP and SP requirements. The role of the frontal eye fields in PP may be related to the advanced temporal preparation and increased salience of the second saccade goal that may mediate activity in other downstream structures, such as the superior colliculus. The parietal lobes may be involved in the preparation for spatial remapping, which is required in double-step conditions. The supplementary eye fields appear to have a more general role in planning saccade sequences that may be related to error monitoring and the control over the execution of the correct sequence of responses.
A Highly Parallelized Special-Purpose Computer for Many-Body Simulations with an Arbitrary Central Force: MD-GRAPE

NASA Astrophysics Data System (ADS)

Fukushige, Toshiyuki; Taiji, Makoto; Makino, Junichiro; Ebisuzaki, Toshikazu; Sugimoto, Daiichiro

1996-09-01

We have developed a parallel, pipelined special-purpose computer for N-body simulations, MD-GRAPE (for "GRAvity PipE"). In gravitational N- body simulations, almost all computing time is spent on the calculation of interactions between particles. GRAPE is specialized hardware to calculate these interactions. It is used with a general-purpose front-end computer that performs all calculations other than the force calculation. MD-GRAPE is the first parallel GRAPE that can calculate an arbitrary central force. A force different from a pure 1/r potential is necessary for N-body simulations with periodic boundary conditions using the Ewald or particle-particle/particle-mesh (P^3^M) method. MD-GRAPE accelerates the calculation of particle-particle force for these algorithms. An MD- GRAPE board has four MD chips and its peak performance is 4.2 GFLOPS. On an MD-GRAPE board, a cosmological N-body simulation takes 6O0(N/10^6^)^3/2^ s per step for the Ewald method, where N is the number of particles, and would take 24O(N/10^6^) s per step for the P^3^M method, in a uniform distribution of particles.
Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

NASA Technical Reports Server (NTRS)

1991-01-01

Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)

Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

1994-01-01

An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

NASA Technical Reports Server (NTRS)

Woo, Alex C.; Hill, Kueichien C.

1996-01-01

The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Parallel line analysis: multifunctional software for the biomedical sciences

NASA Technical Reports Server (NTRS)

Swank, P. R.; Lewis, M. L.; Damron, K. L.; Morrison, D. R.

1990-01-01

An easy to use, interactive FORTRAN program for analyzing the results of parallel line assays is described. The program is menu driven and consists of five major components: data entry, data editing, manual analysis, manual plotting, and automatic analysis and plotting. Data can be entered from the terminal or from previously created data files. The data editing portion of the program is used to inspect and modify data and to statistically identify outliers. The manual analysis component is used to test the assumptions necessary for parallel line assays using analysis of covariance techniques and to determine potency ratios with confidence limits. The manual plotting component provides a graphic display of the data on the terminal screen or on a standard line printer. The automatic portion runs through multiple analyses without operator input. Data may be saved in a special file to expedite input at a future time.
Optimized and parallelized implementation of the electronegativity equalization method and the atom-bond electronegativity equalization method.

PubMed

Vareková, R Svobodová; Koca, J

2006-02-01

The most common way to calculate charge distribution in a molecule is ab initio quantum mechanics (QM). Some faster alternatives to QM have also been developed, the so-called "equalization methods" EEM and ABEEM, which are based on DFT. We have implemented and optimized the EEM and ABEEM methods and created the EEM SOLVER and ABEEM SOLVER programs. It has been found that the most time-consuming part of equalization methods is the reduction of the matrix belonging to the equation system generated by the method. Therefore, for both methods this part was replaced by the parallel algorithm WIRS and implemented within the PVM environment. The parallelized versions of the programs EEM SOLVER and ABEEM SOLVER showed promising results, especially on a single computer with several processors (compact PVM). The implemented programs are available through the Web page http://ncbr.chemi.muni.cz/~n19n/eem_abeem.
Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Developing Information Power Grid Based Algorithms and Software

NASA Technical Reports Server (NTRS)

Dongarra, Jack

1998-01-01

This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Taylor, Arthur C., III

1994-01-01

This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
Memory-based frame synchronizer. [for digital communication systems

NASA Technical Reports Server (NTRS)

Stattel, R. J.; Niswander, J. K. (Inventor)

1981-01-01

A frame synchronizer for use in digital communications systems wherein data formats can be easily and dynamically changed is described. The use of memory array elements provide increased flexibility in format selection and sync word selection in addition to real time reconfiguration ability. The frame synchronizer comprises a serial-to-parallel converter which converts a serial input data stream to a constantly changing parallel data output. This parallel data output is supplied to programmable sync word recognizers each consisting of a multiplexer and a random access memory (RAM). The multiplexer is connected to both the parallel data output and an address bus which may be connected to a microprocessor or computer for purposes of programming the sync word recognizer. The RAM is used as an associative memory or decorder and is programmed to identify a specific sync word. Additional programmable RAMs are used as counter decoders to define word bit length, frame word length, and paragraph frame length.
Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

G.A. Pope; K. Sephernoori; D.C. McKinney

1996-03-15

This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
Parallel/distributed direct method for solving linear systems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
Method, systems, and computer program products for implementing function-parallel network firewall

DOEpatents

Fulp, Errin W [Winston-Salem, NC; Farley, Ryan J [Winston-Salem, NC

2011-10-11

Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.
A computer program for converting rectangular coordinates to latitude-longitude coordinates

USGS Publications Warehouse

Rutledge, A.T.

1989-01-01

A computer program was developed for converting the coordinates of any rectangular grid on a map to coordinates on a grid that is parallel to lines of equal latitude and longitude. Using this program in conjunction with groundwater flow models, the user can extract data and results from models with varying grid orientations and place these data into grid structure that is oriented parallel to lines of equal latitude and longitude. All cells in the rectangular grid must have equal dimensions, and all cells in the latitude-longitude grid measure one minute by one minute. This program is applicable if the map used shows lines of equal latitude as arcs and lines of equal longitude as straight lines and assumes that the Earth 's surface can be approximated as a sphere. The program user enters the row number , column number, and latitude and longitude of the midpoint of the cell for three test cells on the rectangular grid. The latitude and longitude of boundaries of the rectangular grid also are entered. By solving sets of simultaneous linear equations, the program calculates coefficients that are used for making the conversion. As an option in the program, the user may build a groundwater model file based on a grid that is parallel to lines of equal latitude and longitude. The program reads a data file based on the rectangular coordinates and automatically forms the new data file. (USGS)
A distributed version of the NASA Engine Performance Program

NASA Technical Reports Server (NTRS)

Cours, Jeffrey T.; Curlett, Brian P.

1993-01-01

Distributed NEPP, a version of the NASA Engine Performance Program, uses the original NEPP code but executes it in a distributed computer environment. Multiple workstations connected by a network increase the program's speed and, more importantly, the complexity of the cases it can handle in a reasonable time. Distributed NEPP uses the public domain software package, called Parallel Virtual Machine, allowing it to execute on clusters of machines containing many different architectures. It includes the capability to link with other computers, allowing them to process NEPP jobs in parallel. This paper discusses the design issues and granularity considerations that entered into programming Distributed NEPP and presents the results of timing runs.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC

NASA Astrophysics Data System (ADS)

Alruwaili, Manal

With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |

Science.gov Websites

parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1997-01-01

While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amadio, G.; et al.

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less

Undergraduate Training for Industrial Careers.

ERIC Educational Resources Information Center

Stehney, Ann K.

1983-01-01

Forty-eight mathematicians in industry, business, and government replied to a questionnaire on the relative merits of the traditional undergraduate curriculum, advanced topics in pure mathematics, computer programing, additional computer science, and specialized or applied topics. They favored programing and applied mathematics, along with a…
A Hardware-Accelerated Quantum Monte Carlo framework (HAQMC) for N-body systems

NASA Astrophysics Data System (ADS)

Gothandaraman, Akila; Peterson, Gregory D.; Warren, G. Lee; Hinde, Robert J.; Harrison, Robert J.

2009-12-01

Interest in the study of structural and energetic properties of highly quantum clusters, such as inert gas clusters has motivated the development of a hardware-accelerated framework for Quantum Monte Carlo simulations. In the Quantum Monte Carlo method, the properties of a system of atoms, such as the ground-state energies, are averaged over a number of iterations. Our framework is aimed at accelerating the computations in each iteration of the QMC application by offloading the calculation of properties, namely energy and trial wave function, onto reconfigurable hardware. This gives a user the capability to run simulations for a large number of iterations, thereby reducing the statistical uncertainty in the properties, and for larger clusters. This framework is designed to run on the Cray XD1 high performance reconfigurable computing platform, which exploits the coarse-grained parallelism of the processor along with the fine-grained parallelism of the reconfigurable computing devices available in the form of field-programmable gate arrays. In this paper, we illustrate the functioning of the framework, which can be used to calculate the energies for a model cluster of helium atoms. In addition, we present the capabilities of the framework that allow the user to vary the chemical identities of the simulated atoms. Program summaryProgram title: Hardware Accelerated Quantum Monte Carlo (HAQMC) Catalogue identifier: AEEP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEP_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 691 537 No. of bytes in distributed program, including test data, etc.: 5 031 226 Distribution format: tar.gz Programming language: C/C++ for the QMC application, VHDL and Xilinx 8.1 ISE/EDK tools for FPGA design and development Computer: Cray XD1 consisting of a dual-core, dualprocessor AMD Opteron 2.2 GHz with a Xilinx Virtex-4 (V4LX160) or Xilinx Virtex-II Pro (XC2VP50) FPGA per node. We use the compute node with the Xilinx Virtex-4 FPGA Operating system: Red Hat Enterprise Linux OS Has the code been vectorised or parallelized?: Yes Classification: 6.1 Nature of problem: Quantum Monte Carlo is a practical method to solve the Schrödinger equation for large many-body systems and obtain the ground-state properties of such systems. This method involves the sampling of a number of configurations of atoms and averaging the properties of the configurations over a number of iterations. We are interested in applying the QMC method to obtain the energy and other properties of highly quantum clusters, such as inert gas clusters. Solution method: The proposed framework provides a combined hardware-software approach, in which the QMC simulation is performed on the host processor, with the computationally intensive functions such as energy and trial wave function computations mapped onto the field-programmable gate array (FPGA) logic device attached as a co-processor to the host processor. We perform the QMC simulation for a number of iterations as in the case of our original software QMC approach, to reduce the statistical uncertainty of the results. However, our proposed HAQMC framework accelerates each iteration of the simulation, by significantly reducing the time taken to calculate the ground-state properties of the configurations of atoms, thereby accelerating the overall QMC simulation. We provide a generic interpolation framework that can be extended to study a variety of pure and doped atomic clusters, irrespective of the chemical identities of the atoms. For the FPGA implementation of the properties, we use a two-region approach for accurately computing the properties over the entire domain, employ deep pipelines and fixed-point for all our calculations guaranteeing the accuracy required for our simulation.
Cache Locality Optimization for Recursive Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lifflander, Jonathan; Krishnamoorthy, Sriram

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. Wemore » present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.« less
A Comparison of Three Programming Models for Adaptive Applications

NASA Technical Reports Server (NTRS)

Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)

2000-01-01

We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Membrane triangles with corner drilling freedoms. II - The ANDES element

NASA Technical Reports Server (NTRS)

Felippa, Carlos A.; Militello, Carmelo

1992-01-01

This is the second article in a three-part series on the construction of 3-node, 9-dof membrane elements with normal-to-its-plane rotational freedoms (the so-called drilling freedoms) using parametrized variational principles. In this part, one such element is derived within the context of the assumed natural deviatoric strain (ANDES) formulation. The higher-order strains are obtained by constructing three parallel-to-sides pure-bending modes from which natural strains are obtained at the corner points and interpolated over the element. To attain rank sufficiency, an additional higher-order 'torsional' mode, corresponding to equal hierarchical rotations at each corner with all other motions precluded, is incorporated. The resulting formulation has five free parameters. When these parameters are optimized against pure bending by energy balance methods, the resulting element is found to coalesce with the optimal EFF element derived in Part I. Numerical integration as a strain filtering device is found to play a key role in this achievement.
Polarization-modulated FTIR spectroscopy of lipid/gramicidin monolayers at the air/water interface.

PubMed Central

Ulrich, W P; Vogel, H

1999-01-01

Monolayers of gramicidin A, pure and in mixtures with dimyristoylphosphatidylcholine (DMPC), were studied in situ at the air/H2O and air/D2O interfaces by polarization-modulated infrared reflection absorption spectroscopy (PM-IRRAS). Simulations of the entire set of amide I absorption modes were also performed, using complete parameter sets for different conformations based on published normal mode calculations. The structure of gramicidin A in the DMPC monolayer could clearly be assigned to a beta6.3 helix. Quantitative analysis of the amide I bands revealed that film pressures of up to 25-30 mN/m the helix tilt angle from the vertical in the pure gramicidin A layer exceeded 60 degrees. A marked dependence of the peptide orientation on the applied surface pressure was observed for the mixed lipid-peptide monolayers. At low pressure the helix lay flat on the surface, whereas at high pressures the helix was oriented almost parallel to the surface normal. PMID:10049344
Operational Activations Of Maritime Surveillance Services Within The Framework Of MARISS, NEREIDS And SAGRES Projects

NASA Astrophysics Data System (ADS)

Margarit, G.

2013-12-01

This paper presents the results obtained by GMV in the maritime surveillance operational activations conducted in a set of research projects. These activations have been actively supported by users, which feedback has been essential for better understanding their needs and the most urgent requested improvements. Different domains have been evaluated from pure theoretical and scientific background (in terms of processing algorithms) up to pure logistic issues (IT configuration issues, strategies for improving system performance and avoiding bottlenecks, parallelization and back-up procedures). In all the cases, automatizing is the key work because users need almost real time operations where the interaction of human operators is minimized. In addition, automatizing permits reducing human-derived errors and provides better error tracking procedures. In the paper, different examples will be depicted and analysed. For sake of space limitation, only the most representative ones will be selected. Feedback from users will be include and analysed as well.
A new Hysteretic Nonlinear Energy Sink (HNES)

NASA Astrophysics Data System (ADS)

Tsiatas, George C.; Charalampakis, Aristotelis E.

2018-07-01

The behavior of a new Hysteretic Nonlinear Energy Sink (HNES) coupled to a linear primary oscillator is investigated in shock mitigation. Apart from a small mass and a nonlinear elastic spring of the Duffing oscillator, the HNES is also comprised of a purely hysteretic and a linear elastic spring of potentially negative stiffness, connected in parallel. The Bouc-Wen model is used to describe the force produced by both the purely hysteretic and linear elastic springs. Coupling the primary oscillator with the HNES, three nonlinear equations of motion are derived in terms of the two displacements and the dimensionless hysteretic variable, which are integrated numerically using the analog equation method. The performance of the HNES is examined by quantifying the percentage of the initially induced energy in the primary system that is passively transferred and dissipated by the HNES. Remarkable results are achieved for a wide range of initial input energies. The great performance of the HNES is mostly evidenced when the linear spring stiffness takes on negative values.
Three new enantiomerically pure ferrocenylphosphole compounds.

PubMed

López Cortés, José Guadalupe; Vincendeau, Sandrine; Daran, Jean Claude; Manoury, Eric; Gouygou, Maryse

2006-05-01

The absolute configurations of three new enantiomerically pure ferrocenylphosphole compounds, namely (2S,4S,S(Fc))-4-methoxymethyl-2-[2-(9-thioxo-9lambda5-phosphafluoren-9-yl)ferrocenyl]-1,3-dioxane, [Fe(C5H5)(C23H22O3PS)], (III), (S(Fc))-[2-(9-thioxo-9lambda5-phosphafluoren-9-yl)ferrocenyl]methanol, [Fe(C5H5)(C18H14OPS)], (V), and (S(Fc))-diphenyl[2-(9-thioxo-9lambda5-phosphafluoren-9-yl]ferrocenylmethyl]phosphine, [Fe(C5H5)(C30H23P2)], (VIII), have been unambiguously established. All three ligands contain a planar chiral ferrocene group, bearing a dibenzophosphole and either a dioxane, a methanol or a diphenylphosphinomethane group on the same cyclopentadienyl. In compound (V), the occurrence of O-H...S and C-H...S hydrogen bonds results in the formation of a two-dimensional network parallel to (001). The geometry of the ferrocene frameworks agrees with related reported structures.
Some TEM observations of Al2O3 scales formed on NiCrAl alloys

NASA Technical Reports Server (NTRS)

Smialek, J.; Gibala, R.

1979-01-01

The microstructural development of Al2O3 scales on NiCrAl alloys has been examined by transmission electron microscopy. Voids were observed within grains in scales formed on a pure NiCrAl alloy. Both voids and oxide grains grew measurably with oxidation time at 1100 C. The size and amount of porosity decreased towards the oxide-metal growth interface. The voids resulted from an excess number of oxygen vacancies near the oxidemetal interface. Short-circuit diffusion paths were discussed in reference to current growth stress models for oxide scales. Transient oxidation of pure, Y-doped, and Zr-doped NiCrAl was also examined. Oriented alpha-(Al, Cr)2O3 and Ni(Al, Cr)2O4 scales often coexisted in layered structures on all three alloys. Close-packed oxygen planes and directions in the corundum and spinel layers were parallel. The close relationship between oxide layers provided a gradual transition from initial transient scales to steady state Al2O3 growth.
Parallel-Processing Software for Correlating Stereo Images

NASA Technical Reports Server (NTRS)

Klimeck, Gerhard; Deen, Robert; Mcauley, Michael; DeJong, Eric

2007-01-01

A computer program implements parallel- processing algorithms for cor relating images of terrain acquired by stereoscopic pairs of digital stereo cameras on an exploratory robotic vehicle (e.g., a Mars rove r). Such correlations are used to create three-dimensional computatio nal models of the terrain for navigation. In this program, the scene viewed by the cameras is segmented into subimages. Each subimage is assigned to one of a number of central processing units (CPUs) opera ting simultaneously.
Optimization by nonhierarchical asynchronous decomposition

NASA Technical Reports Server (NTRS)

Shankar, Jayashree; Ribbens, Calvin J.; Haftka, Raphael T.; Watson, Layne T.

1992-01-01

Large scale optimization problems are tractable only if they are somehow decomposed. Hierarchical decompositions are inappropriate for some types of problems and do not parallelize well. Sobieszczanski-Sobieski has proposed a nonhierarchical decomposition strategy for nonlinear constrained optimization that is naturally parallel. Despite some successes on engineering problems, the algorithm as originally proposed fails on simple two dimensional quadratic programs. The algorithm is carefully analyzed for quadratic programs, and a number of modifications are suggested to improve its robustness.
Final Report: Correctness Tools for Petascale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mellor-Crummey, John

2014-10-27

In the course of developing parallel programs for leadership computing systems, subtle programming errors often arise that are extremely difficult to diagnose without tools. To meet this challenge, University of Maryland, the University of Wisconsin—Madison, and Rice University worked to develop lightweight tools to help code developers pinpoint a variety of program correctness errors that plague parallel scientific codes. The aim of this project was to develop software tools that help diagnose program errors including memory leaks, memory access errors, round-off errors, and data races. Research at Rice University focused on developing algorithms and data structures to support efficient monitoringmore » of multithreaded programs for memory access errors and data races. This is a final report about research and development work at Rice University as part of this project.« less
ng: What next-generation languages can teach us about HENP frameworks in the manycore era

NASA Astrophysics Data System (ADS)

Binet, Sébastien

2011-12-01

Current High Energy and Nuclear Physics (HENP) frameworks were written before multicore systems became widely deployed. A 'single-thread' execution model naturally emerged from that environment, however, this no longer fits into the processing model on the dawn of the manycore era. Although previous work focused on minimizing the changes to be applied to the LHC frameworks (because of the data taking phase) while still trying to reap the benefits of the parallel-enhanced CPU architectures, this paper explores what new languages could bring to the design of the next-generation frameworks. Parallel programming is still in an intensive phase of R&D and no silver bullet exists despite the 30+ years of literature on the subject. Yet, several parallel programming styles have emerged: actors, message passing, communicating sequential processes, task-based programming, data flow programming, ... to name a few. We present the work of the prototyping of a next-generation framework in new and expressive languages (python and Go) to investigate how code clarity and robustness are affected and what are the downsides of using languages younger than FORTRAN/C/C++.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
A Programming Framework for Scientific Applications on CPU-GPU Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Owens, John

2013-03-24

At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiplemore » parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.« less
Transputer parallel processing at NASA Lewis Research Center

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1989-01-01

The transputer parallel processing lab at NASA Lewis Research Center (LeRC) consists of 69 processors (transputers) that can be connected into various networks for use in general purpose concurrent processing applications. The main goal of the lab is to develop concurrent scientific and engineering application programs that will take advantage of the computational speed increases available on a parallel processor over the traditional sequential processor. Current research involves the development of basic programming tools. These tools will help standardize program interfaces to specific hardware by providing a set of common libraries for applications programmers. The thrust of the current effort is in developing a set of tools for graphics rendering/animation. The applications programmer currently has two options for on-screen plotting. One option can be used for static graphics displays and the other can be used for animated motion. The option for static display involves the use of 2-D graphics primitives that can be called from within an application program. These routines perform the standard 2-D geometric graphics operations in real-coordinate space as well as allowing multiple windows on a single screen.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chrisochoides, N.; Sukup, F.

In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.

Analysis of the study skills of undergraduate pharmacy students of the University of Zambia School of Medicine.

PubMed

Ezeala, Christian Chinyere; Siyanga, Nalucha

2015-01-01

It aimed to compare the study skills of two groups of undergraduate pharmacy students in the School of Medicine, University of Zambia using the Study Skills Assessment Questionnaire (SSAQ), with the goal of analysing students' study skills and identifying factors that affect study skills. A questionnaire was distributed to 67 participants from both programs using stratified random sampling. Completed questionnaires were rated according to participants study skill. The total scores and scores within subscales were analysed and compared quantitatively. Questionnaires were distributed to 37 students in the regular program, and to 30 students in the parallel program. The response rate was 100%. Students had moderate to good study skills: 22 respondents (32.8%) showed good study skills, while 45 respondents (67.2%) were found to have moderate study skills. Students in the parallel program demonstrated significantly better study skills (mean SSAQ score, 185.4±14.5), particularly in time management and writing, than the students in the regular program (mean SSAQ score 175±25.4; P<0.05). No significant differences were found according to age, gender, residential or marital status, or level of study. The students in the parallel program had better time management and writing skills, probably due to their prior work experience. The more intensive training to students in regular program is needed in improving time management and writing skills.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
High-energy physics software parallelization using database techniques

NASA Astrophysics Data System (ADS)

Argante, E.; van der Stok, P. D. V.; Willers, I.

1997-02-01

A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradimg, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI.
Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

NASA Technical Reports Server (NTRS)

West, Jeff; Yang, H. Q.

2014-01-01

There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.
mm_par2.0: An object-oriented molecular dynamics simulation program parallelized using a hierarchical scheme with MPI and OPENMP

NASA Astrophysics Data System (ADS)

Oh, Kwang Jin; Kang, Ji Hoon; Myung, Hun Joo

2012-02-01

We have revised a general purpose parallel molecular dynamics simulation program mm_par using the object-oriented programming. We parallelized the revised version using a hierarchical scheme in order to utilize more processors for a given system size. The benchmark result will be presented here. New version program summaryProgram title: mm_par2.0 Catalogue identifier: ADXP_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXP_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2 390 858 No. of bytes in distributed program, including test data, etc.: 25 068 310 Distribution format: tar.gz Programming language: C++ Computer: Any system operated by Linux or Unix Operating system: Linux Classification: 7.7 External routines: We provide wrappers for FFTW [1], Intel MKL library [2] FFT routine, and Numerical recipes [3] FFT, random number generator, and eigenvalue solver routines, SPRNG [4] random number generator, Mersenne Twister [5] random number generator, space filling curve routine. Catalogue identifier of previous version: ADXP_v1_0 Journal reference of previous version: Comput. Phys. Comm. 174 (2006) 560 Does the new version supersede the previous version?: Yes Nature of problem: Structural, thermodynamic, and dynamical properties of fluids and solids from microscopic scales to mesoscopic scales. Solution method: Molecular dynamics simulation in NVE, NVT, and NPT ensemble, Langevin dynamics simulation, dissipative particle dynamics simulation. Reasons for new version: First, object-oriented programming has been used, which is known to be open for extension and closed for modification. It is also known to be better for maintenance. Second, version 1.0 was based on atom decomposition and domain decomposition scheme [6] for parallelization. However, atom decomposition is not popular due to its poor scalability. On the other hand, domain decomposition scheme is better for scalability. It still has a limitation in utilizing a large number of cores on recent petascale computers due to the requirement that the domain size is larger than the potential cutoff distance. To go beyond such a limitation, a hierarchical parallelization scheme has been adopted in this new version and implemented using MPI [7] and OPENMP [8]. Summary of revisions: (1) Object-oriented programming has been used. (2) A hierarchical parallelization scheme has been adopted. (3) SPME routine has been fully parallelized with parallel 3D FFT using volumetric decomposition scheme [9]. K.J.O. thanks Mr. Seung Min Lee for useful discussion on programming and debugging. Running time: Running time depends on system size and methods used. For test system containing a protein (PDB id: 5DHFR) with CHARMM22 force field [10] and 7023 TIP3P [11] waters in simulation box having dimension 62.23 Å×62.23 Å×62.23 Å, the benchmark results are given in Fig. 1. Here the potential cutoff distance was set to 12 Å and the switching function was applied from 10 Å for the force calculation in real space. For the SPME [12] calculation, K, K, and K were set to 64 and the interpolation order was set to 4. To do the fast Fourier transform, we used Intel MKL library. All bonds including hydrogen atoms were constrained using SHAKE/RATTLE algorithms [13,14]. The code was compiled using Intel compiler version 11.1 and mvapich2 version 1.5. Fig. 2 shows performance gains from using CUDA-enabled version [15] of mm_par for 5DHFR simulation in water on Intel Core2Quad 2.83 GHz and GeForce GTX 580. Even though mm_par2.0 is not ported yet for GPU, its performance data would be useful to expect mm_par2.0 performance on GPU. Timing results for 1000 MD steps. 1, 2, 4, and 8 in the figure mean the number of OPENMP threads. Timing results for 1000 MD steps from double precision simulation on CPU, single precision simulation on GPU, and double precision simulation on GPU.
Analysis and selection of optimal function implementations in massively parallel computer

DOEpatents

Archer, Charles Jens [Rochester, MN; Peters, Amanda [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2011-05-31

An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.
The Portals 4.0 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2012-11-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Automating FEA programming

NASA Technical Reports Server (NTRS)

Sharma, Naveen

1992-01-01

In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.
Development for SSV on a parallel processing system (PARAGON)

NASA Astrophysics Data System (ADS)

Gothard, Benny M.; Allmen, Mark; Carroll, Michael J.; Rich, Dan

1995-12-01

A goal of the surrogate semi-autonomous vehicle (SSV) program is to have multiple vehicles navigate autonomously and cooperatively with other vehicles. This paper describes the process and tools used in porting UGV/SSV (unmanned ground vehicle) autonomous mobility and target recognition algorithms from a SISD (single instruction single data) processor architecture (i.e., a Sun SPARC workstation running C/UNIX) to a MIMD (multiple instruction multiple data) parallel processor architecture (i.e., PARAGON-a parallel set of i860 processors running C/UNIX). It discusses the gains in performance and the pitfalls of such a venture. It also examines the merits of this processor architecture (based on this conceptual prototyping effort) and programming paradigm to meet the final SSV demonstration requirements.
GASPRNG: GPU accelerated scalable parallel random number generator library

NASA Astrophysics Data System (ADS)

Gao, Shuang; Peterson, Gregory D.

2013-04-01

Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or workstation with NVIDIA GPU (Tested on Fermi GTX480, Tesla C1060, Tesla M2070). Operating system: Linux with CUDA version 4.0 or later. Should also run on MacOS, Windows, or UNIX. Has the code been vectorized or parallelized?: Yes. Parallelized using MPI directives. RAM: 512 MB˜ 732 MB (main memory on host CPU, depending on the data type of random numbers.) / 512 MB (GPU global memory) Classification: 4.13, 6.5. Nature of problem: Many computational science applications are able to consume large numbers of random numbers. For example, Monte Carlo simulations are able to consume limitless random numbers for the computation as long as resources for the computing are supported. Moreover, parallel computational science applications require independent streams of random numbers to attain statistically significant results. The SPRNG library provides this capability, but at a significant computational cost. The GASPRNG library presented here accelerates the generators of independent streams of random numbers using graphical processing units (GPUs). Solution method: Multiple copies of random number generators in GPUs allow a computational science application to consume large numbers of random numbers from independent, parallel streams. GASPRNG is a random number generators library to allow a computational science application to employ multiple copies of random number generators to boost performance. Users can interface GASPRNG with software code executing on microprocessors and/or GPUs. Running time: The tests provided take a few minutes to run.
Fenix, A Fault Tolerant Programming Framework for MPI Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamel, Marc; Teranihi, Keita; Valenzuela, Eric

2016-10-05

Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
GPU COMPUTING FOR PARTICLE TRACKING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishimura, Hiroshi; Song, Kai; Muriki, Krishna

2011-03-25

This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Large-scale parallel lattice Boltzmann-cellular automaton model of two-dimensional dendritic growth

NASA Astrophysics Data System (ADS)

Jelinek, Bohumir; Eshraghi, Mohsen; Felicelli, Sergio; Peters, John F.

2014-03-01

An extremely scalable lattice Boltzmann (LB)-cellular automaton (CA) model for simulations of two-dimensional (2D) dendritic solidification under forced convection is presented. The model incorporates effects of phase change, solute diffusion, melt convection, and heat transport. The LB model represents the diffusion, convection, and heat transfer phenomena. The dendrite growth is driven by a difference between actual and equilibrium liquid composition at the solid-liquid interface. The CA technique is deployed to track the new interface cells. The computer program was parallelized using the Message Passing Interface (MPI) technique. Parallel scaling of the algorithm was studied and major scalability bottlenecks were identified. Efficiency loss attributable to the high memory bandwidth requirement of the algorithm was observed when using multiple cores per processor. Parallel writing of the output variables of interest was implemented in the binary Hierarchical Data Format 5 (HDF5) to improve the output performance, and to simplify visualization. Calculations were carried out in single precision arithmetic without significant loss in accuracy, resulting in 50% reduction of memory and computational time requirements. The presented solidification model shows a very good scalability up to centimeter size domains, including more than ten million of dendrites. Catalogue identifier: AEQZ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQZ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, UK Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 29,767 No. of bytes in distributed program, including test data, etc.: 3131,367 Distribution format: tar.gz Programming language: Fortran 90. Computer: Linux PC and clusters. Operating system: Linux. Has the code been vectorized or parallelized?: Yes. Program is parallelized using MPI. Number of processors used: 1-50,000 RAM: Memory requirements depend on the grid size Classification: 6.5, 7.7. External routines: MPI (http://www.mcs.anl.gov/research/projects/mpi/), HDF5 (http://www.hdfgroup.org/HDF5/) Nature of problem: Dendritic growth in undercooled Al-3 wt% Cu alloy melt under forced convection. Solution method: The lattice Boltzmann model solves the diffusion, convection, and heat transfer phenomena. The cellular automaton technique is deployed to track the solid/liquid interface. Restrictions: Heat transfer is calculated uncoupled from the fluid flow. Thermal diffusivity is constant. Unusual features: Novel technique, utilizing periodic duplication of a pre-grown “incubation” domain, is applied for the scaleup test. Running time: Running time varies from minutes to days depending on the domain size and number of computational cores.
A numerical differentiation library exploiting parallel architectures

NASA Astrophysics Data System (ADS)

Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

2009-08-01

We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

NASA Astrophysics Data System (ADS)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and Diamantaras, K.: 'Programming and architecture of parallel processing systems', 1st Edition, Eds. Kleidarithmos, 2011 [4] NVIDIA.: 'NVidia CUDA C Programming Guide', version 5.0, NVidia (reference book) [5] Konstantaras, A.: 'Classification of Distinct Seismic Regions and Regional Temporal Modelling of Seismicity in the Vicinity of the Hellenic Seismic Arc', IEEE Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6 (4), pp. 1857-1863, 2013 [6] Konstantaras, A. Varley, M.R.,. Valianatos, F., Collins, G. and Holifield, P.: 'Recognition of electric earthquake precursors using neuro-fuzzy models: methodology and simulation results', Proc. IASTED International Conference on Signal Processing Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 2002, pp 303-308, 2002 [7] Konstantaras, A., Katsifarakis, E., Maravelakis, E., Skounakis, E., Kokkinos, E. and Karapidakis, E.: 'Intelligent Spatial-Clustering of Seismicity in the Vicinity of the Hellenic Seismic Arc', Earth Science Research, vol. 1 (2), pp. 1-10, 2012 [8] Georgoulas, G., Konstantaras, A., Katsifarakis, E., Stylios, C.D., Maravelakis, E. and Vachtsevanos, G.: '"Seismic-Mass" Density-based Algorithm for Spatio-Temporal Clustering', Expert Systems with Applications, vol. 40 (10), pp. 4183-4189, 2013 [9] Konstantaras, A. J.: 'Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters', Earth Science Informatics, 2015 (In Press, see: www.scopus.com) [10] Drakatos, G. and Latoussakis, J.: 'A catalog of aftershock sequences in Greece (1971-1997): Their spatial and temporal characteristics', Journal of Seismology, vol. 5, pp. 137-145, 2001
Array distribution in data-parallel programs

NASA Technical Reports Server (NTRS)

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.

1994-01-01

We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.
Identification and quantification of carbamate pesticides in dried lime tree flowers by means of excitation-emission molecular fluorescence and parallel factor analysis when quenching effect exists.

PubMed

Rubio, L; Ortiz, M C; Sarabia, L A

2014-04-11

A non-separative, fast and inexpensive spectrofluorimetric method based on the second order calibration of excitation-emission fluorescence matrices (EEMs) was proposed for the determination of carbaryl, carbendazim and 1-naphthol in dried lime tree flowers. The trilinearity property of three-way data was used to handle the intrinsic fluorescence of lime flowers and the difference in the fluorescence intensity of each analyte. It also made possible to identify unequivocally each analyte. Trilinearity of the data tensor guarantees the uniqueness of the solution obtained through parallel factor analysis (PARAFAC), so the factors of the decomposition match up with the analytes. In addition, an experimental procedure was proposed to identify, with three-way data, the quenching effect produced by the fluorophores of the lime flowers. This procedure also enabled the selection of the adequate dilution of the lime flowers extract to minimize the quenching effect so the three analytes can be quantified. Finally, the analytes were determined using the standard addition method for a calibration whose standards were chosen with a D-optimal design. The three analytes were unequivocally identified by the correlation between the pure spectra and the PARAFAC excitation and emission spectral loadings. The trueness was established by the accuracy line "calculated concentration versus added concentration" in all cases. Better decision limit values (CCα), in x0=0 with the probability of false positive fixed at 0.05, were obtained for the calibration performed in pure solvent: 2.97 μg L(-1) for 1-naphthol, 3.74 μg L(-1) for carbaryl and 23.25 μg L(-1) for carbendazim. The CCα values for the second calibration carried out in matrix were 1.61, 4.34 and 51.75 μg L(-1) respectively; while the values obtained considering only the pure samples as calibration set were: 2.65, 8.61 and 28.7 μg L(-1), respectively. Copyright © 2014 Elsevier B.V. All rights reserved.
Purely electromagnetic spacetimes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ivanov, B. V.

The Rainich's program of describing metrics induced by pure electromagnetic fields is implemented in a simpler way by using the Ernst formalism and increasing the symmetry of spacetime. Stationary metrics possessing one, two or three Killing vectors are studied and classified. Three branches of solutions exist. Electromagnetically induced mass terms appear in two of them, including a class of solutions in harmonic functions. The static subcase is discussed too. Relations to other well-known electrovacuum metrics are elucidated.

MIST: An Open Source Environmental Modelling Programming Language Incorporating Easy to Use Data Parallelism.

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2014-05-01

Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.
Operators manual for the magnetograph program (section 2)

NASA Technical Reports Server (NTRS)

November, L.; Title, A. M.

1974-01-01

This manual for use of the magnetograph program describes: (1) black box use of the programs; (2) the magtape data formats used; (3) the adjustable control parameters in the program; and (4) the algorithms. With no adjustments on the control parameters this program may be used purely as a black box. For optimal use, however, the control parameters may be varied. The magtape data formats are of use in adopting other programs to look at raw data or final magnetograph data.
Parallel design patterns for a low-power, software-defined compressed video encoder

NASA Astrophysics Data System (ADS)

Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

2011-06-01

Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
Guidelines for a Training Program for Audiometric Technicians. Report of Working Group 66.

ERIC Educational Resources Information Center

Glorig, Aram, Ed.; And Others

The document outlines a course designed to train audiometric technicians who will conduct pure-tone conduction tests as part of a program on hearing conservation in noise. A minimum of two days is required for the completion of the course. The outline of the training program presents nine topics with an indication of the minimum time required for…
Sequential or parallel decomposed processing of two-digit numbers? Evidence from eye-tracking.

PubMed

Moeller, Korbinian; Fischer, Martin H; Nuerk, Hans-Christoph; Willmes, Klaus

2009-02-01

While reaction time data have shown that decomposed processing of two-digit numbers occurs, there is little evidence about how decomposed processing functions. Poltrock and Schwartz (1984) argued that multi-digit numbers are compared in a sequential digit-by-digit fashion starting at the leftmost digit pair. In contrast, Nuerk and Willmes (2005) favoured parallel processing of the digits constituting a number. These models (i.e., sequential decomposition, parallel decomposition) make different predictions regarding the fixation pattern in a two-digit number magnitude comparison task and can therefore be differentiated by eye fixation data. We tested these models by evaluating participants' eye fixation behaviour while selecting the larger of two numbers. The stimulus set consisted of within-decade comparisons (e.g., 53_57) and between-decade comparisons (e.g., 42_57). The between-decade comparisons were further divided into compatible and incompatible trials (cf. Nuerk, Weger, & Willmes, 2001) and trials with different decade and unit distances. The observed fixation pattern implies that the comparison of two-digit numbers is not executed by sequentially comparing decade and unit digits as proposed by Poltrock and Schwartz (1984) but rather in a decomposed but parallel fashion. Moreover, the present fixation data provide first evidence that digit processing in multi-digit numbers is not a pure bottom-up effect, but is also influenced by top-down factors. Finally, implications for multi-digit number processing beyond the range of two-digit numbers are discussed.
Surgical bedside master console for neurosurgical robotic system.

PubMed

Arata, Jumpei; Kenmotsu, Hajime; Takagi, Motoki; Hori, Tatsuya; Miyagi, Takahiro; Fujimoto, Hideo; Kajita, Yasukazu; Hayashi, Yuichiro; Chinzei, Kiyoyuki; Hashizume, Makoto

2013-01-01

We are currently developing a neurosurgical robotic system that facilitates access to residual tumors and improves brain tumor removal surgical outcomes. The system combines conventional and robotic surgery allowing for a quick conversion between the procedures. This concept requires a new master console that can be positioned at the surgical bedside and be sterilized. The master console was developed using new technologies, such as a parallel mechanism and pneumatic sensors. The parallel mechanism is a purely passive 5-DOF (degrees of freedom) joystick based on the author's haptic research. The parallel mechanism enables motion input of conventional brain tumor removal surgery with a compact, intuitive interface that can be used in a conventional surgical environment. In addition, the pneumatic sensors implemented on the mechanism provide an intuitive interface and electrically isolate the tool parts from the mechanism so they can be easily sterilized. The 5-DOF parallel mechanism is compact (17 cm width, 19cm depth, and 15cm height), provides a 505,050 mm and 90° workspace and is highly backdrivable (0.27N of resistance force representing the surgical motion). The evaluation tests revealed that the pneumatic sensors can properly measure the suction strength, grasping force, and hand contact. In addition, an installability test showed that the master console can be used in a conventional surgical environment. The proposed master console design was shown to be feasible for operative neurosurgery based on comprehensive testing. This master console is currently being tested for master-slave control with a surgical robotic system.
A Model for Speedup of Parallel Programs

DTIC Science & Technology

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers

PubMed Central

Besnier, Francois; Glover, Kevin A.

2013-01-01

This software package provides an R-based framework to make use of multi-core computers when running analyses in the population genetics program STRUCTURE. It is especially addressed to those users of STRUCTURE dealing with numerous and repeated data analyses, and who could take advantage of an efficient script to automatically distribute STRUCTURE jobs among multiple processors. It also consists of additional functions to divide analyses among combinations of populations within a single data set without the need to manually produce multiple projects, as it is currently the case in STRUCTURE. The package consists of two main functions: MPI_structure() and parallel_structure() as well as an example data file. We compared the performance in computing time for this example data on two computer architectures and showed that the use of the present functions can result in several-fold improvements in terms of computation time. ParallelStructure is freely available at https://r-forge.r-project.org/projects/parallstructure/. PMID:23923012
Performance of a parallel code for the Euler equations on hypercube computers

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.

1990-01-01

The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.
Automating the parallel processing of fluid and structural dynamics calculations

NASA Technical Reports Server (NTRS)

Arpasi, Dale J.; Cole, Gary L.

1987-01-01

The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

PubMed

Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

2014-09-09

A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
Parallel processing for scientific computations

NASA Technical Reports Server (NTRS)

Alkhatib, Hasan S.

1991-01-01

The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Parallel computation using boundary elements in solid mechanics

NASA Technical Reports Server (NTRS)

Chien, L. S.; Sun, C. T.

1990-01-01

The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.
Managing Algorithmic Skeleton Nesting Requirements in Realistic Image Processing Applications: The Case of the SKiPPER-II Parallel Programming Environment's Operating Model

NASA Astrophysics Data System (ADS)

Coudarcher, Rémi; Duculty, Florent; Serot, Jocelyn; Jurie, Frédéric; Derutin, Jean-Pierre; Dhome, Michel

2005-12-01

SKiPPER is a SKeleton-based Parallel Programming EnviRonment being developed since 1996 and running at LASMEA Laboratory, the Blaise-Pascal University, France. The main goal of the project was to demonstrate the applicability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This paper deals with the special features embedded in the latest version of the project: algorithmic skeleton nesting capabilities and a fully dynamic operating model. Throughout the case study of a complete and realistic image processing application, in which we have pointed out the requirement for skeleton nesting, we are presenting the operating model of this feature. The work described here is one of the few reported experiments showing the application of skeleton nesting facilities for the parallelisation of a realistic application, especially in the area of image processing. The image processing application we have chosen is a 3D face-tracking algorithm from appearance.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

PubMed Central

Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Porosity localizing instability in a compacting porous layer in a pure shear flow and the evolution of porosity band wavelength

NASA Astrophysics Data System (ADS)

Butler, S. L.

2010-09-01

A porosity localizing instability occurs in compacting porous media that are subjected to shear if the viscosity of the solid matrix decreases with porosity ( Stevenson, 1989). This instability may have significant consequences for melt transport in regions of partial melt in the mantle and may significantly modify the effective viscosity of the asthenosphere ( Kohlstedt and Holtzman, 2009). Most analyses of this instability have been carried out assuming an imposed simple shear flow (e.g., Spiegelman, 2003; Katz et al., 2006; Butler, 2009). Pure shear can be realized in laboratory experiments and studying the instability in a pure shear flow allows us to test the generality of some of the results derived for simple shear and the flow pattern for pure shear more easily separates the effects of deformation from rotation. Pure shear flows may approximate flows near the tops of mantle plumes near earth's surface and in magma chambers. In this study, we present linear theory and nonlinear numerical model results for a porosity and strain-rate weakening compacting porous layer subjected to pure shear and we investigate the effects of buoyancy-induced oscillations. The linear theory and numerical model will be shown to be in excellent agreement. We will show that melt bands grow at the same angles to the direction of maximum compression as in simple shear and that buoyancy-induced oscillations do not significantly inhibit the porosity localizing instability. In a pure shear flow, bands parallel to the direction of maximum compression increase exponentially in wavelength with time. However, buoyancy-induced oscillations are shown to inhibit this increase in wavelength. In a simple shear flow, bands increase in wavelength when they are in the orientation for growth of the porosity localizing instability. Because the amplitude spectrum is always dominated by bands in this orientation, band wavelengths increase with time throughout simple shear simulations until the wavelength becomes similar to one compaction length. Once the wavelength becomes similar to one compaction length, the growth of the amplitude of the band slows and shorter wavelength bands that are increasing in amplitude at a greater rate take over. This may provide a mechanism to explain the experimental observation that band spacing is controlled by the compaction length ( Kohlstedt and Holtzman, 2009).
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
HeNCE: A Heterogeneous Network Computing Environment

DOE PAGES

Beguelin, Adam; Dongarra, Jack J.; Geist, George Al; ...

1994-01-01

Network computing seeks to utilize the aggregate resources of many networked computers to solve a single problem. In so doing it is often possible to obtain supercomputer performance from an inexpensive local area network. The drawback is that network computing is complicated and error prone when done by hand, especially if the computers have different operating systems and data formats and are thus heterogeneous. The heterogeneous network computing environment (HeNCE) is an integrated graphical environment for creating and running parallel programs over a heterogeneous collection of computers. It is built on a lower level package called parallel virtual machine (PVM).more » The HeNCE philosophy of parallel programming is to have the programmer graphically specify the parallelism of a computation and to automate, as much as possible, the tasks of writing, compiling, executing, debugging, and tracing the network computation. Key to HeNCE is a graphical language based on directed graphs that describe the parallelism and data dependencies of an application. Nodes in the graphs represent conventional Fortran or C subroutines and the arcs represent data and control flow. This article describes the present state of HeNCE, its capabilities, limitations, and areas of future research.« less

The AIS-5000 parallel processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmitt, L.A.; Wilson, S.S.

1988-05-01

The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wood, Mitchell; Thompson, Aidan P.

The purpose of this short contribution is to report on the development of a Spectral Neighbor Analysis Potential (SNAP) for tungsten. We have focused on the characterization of elastic and defect properties of the pure material in order to support molecular dynamics simulations of plasma-facing materials in fusion reactors. A parallel genetic algorithm approach was used to efficiently search for fitting parameters optimized against a large number of objective functions. In addition, we have shown that this many-body tungsten potential can be used in conjunction with a simple helium pair potential1 to produce accurate defect formation energies for the W-Hemore » binary system.« less
M dwarfs: Theoretical work

NASA Technical Reports Server (NTRS)

Mullan, Dermott J.

1987-01-01

Theoretical work on the atmospheres of M dwarfs has progressed along lines parallel to those followed in the study of other classes of stars. Such models have become increasingly sophisticated as improvements in opacities, in the equation of state, and in the treatment of convection were incorporated during the last 15 to 20 years. As a result, spectrophotometric data on M dwarfs can now be fitted rather well by current models. The various attempts at modeling M dwarf photospheres in purely thermal terms are summarized. Some extensions of these models to include the effects of microturbulence and magnetic inhomogeneities are presented.
Live Soap: Stability, Order, and Fluctuations in Apolar Active Smectics

NASA Astrophysics Data System (ADS)

Adhyapak, Tapan Chandra; Ramaswamy, Sriram; Toner, John

2013-03-01

We construct a hydrodynamic theory of noisy, apolar active smectics in bulk suspension or on a substrate. Unlike purely orientationally ordered active fluids, active apolar smectics can be dynamically stable in Stokesian bulk suspensions. Smectic order in these systems is quasilong ranged in dimension d=2 and long ranged in d=3. We predict reentrant Kosterlitz-Thouless melting to an active nematic in our simplest model in d=2, a nonzero second-sound speed parallel to the layers in bulk suspensions, and that there are no giant number fluctuations in either case. We also briefly discuss possible instabilities in these systems.
Evaluation of aural manifestations in temporo-mandibular joint dysfunction.

PubMed

Sobhy, O A; Koutb, A R; Abdel-Baki, F A; Ali, T M; El Raffa, I Z; Khater, A H

2004-08-01

Thirty patients with temporo-mandibular joint dysfunction were selected to investigate the changes in otoacoustic emissions before and after conservative treatment of their temporo-mandibular joints. Pure tone audiometry, transient-evoked otoacoustic emissions (TEOAE), distortion-product otoacoustic emissions (DPOAE) as well as a tinnitus questionnaire were administered to all patients before and after therapy. Therapy was conservative in the form of counselling, physiotherapy, anti-inflammatory agents, muscle relaxants, and occlusal splints. Results indicated insignificant changes in the TEOAEs, whereas there were significant increases in distortion product levels at most of the frequency bands. These results were paralleled to subjective improvement of tinnitus.
Voltage-spike analysis for a free-running parallel inverter

NASA Technical Reports Server (NTRS)

Lee, F. C. Y.; Wilson, T. G.

1974-01-01

Unwanted and sometimes damaging high-amplitude voltage spikes occur during each half cycle in many transistor saturable-core inverters at the moment when the core saturates and the transistors switch. The analysis shows that spikes are an intrinsic characteristic of certain types of inverters even with negligible leakage inductance and purely resistive load. The small but unavoidable after-saturation inductance of the saturable-core transformer plays an essential role in creating these undesired thigh-voltage spikes. State-plane analysis provides insight into the complex interaction between core and transistors, and shows the circuit parameters upon which the magnitude of these spikes depends.
Flow cytometry for enrichment and titration in massively parallel DNA sequencing

PubMed Central

Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim

2009-01-01

Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748
Asymptotic-preserving Lagrangian approach for modeling anisotropic transport in magnetized plasmas

NASA Astrophysics Data System (ADS)

Chacon, Luis; Del-Castillo-Negrete, Diego

2012-03-01

Modeling electron transport in magnetized plasmas is extremely challenging due to the extreme anisotropy between parallel (to the magnetic field) and perpendicular directions (the transport-coefficient ratio χ/χ˜10^10 in fusion plasmas). Recently, a novel Lagrangian Green's function method has been proposedfootnotetextD. del-Castillo-Negrete, L. Chac'on, PRL, 106, 195004 (2011); D. del-Castillo-Negrete, L. Chac'on, Phys. Plasmas, submitted (2011) to solve the local and non-local purely parallel transport equation in general 3D magnetic fields. The approach avoids numerical pollution, is inherently positivity-preserving, and is scalable algorithmically (i.e., work per degree-of-freedom is grid-independent). In this poster, we discuss the extension of the Lagrangian Green's function approach to include perpendicular transport terms and sources. We present an asymptotic-preserving numerical formulation, which ensures a consistent numerical discretization temporally and spatially for arbitrary χ/χ ratios. We will demonstrate the potential of the approach with various challenging configurations, including the case of transport across a magnetic island in cylindrical geometry.
Analysis of multiple internal reflections in a parallel aligned liquid crystal on silicon SLM.

PubMed

Martínez, José Luis; Moreno, Ignacio; del Mar Sánchez-López, María; Vargas, Asticio; García-Martínez, Pascuala

2014-10-20

Multiple internal reflection effects on the optical modulation of a commercial reflective parallel-aligned liquid-crystal on silicon (PAL-LCoS) spatial light modulator (SLM) are analyzed. The display is illuminated with different wavelengths and different angles of incidence. Non-negligible Fabry-Perot (FP) effect is observed due to the sandwiched LC layer structure. A simplified physical model that quantitatively accounts for the observed phenomena is proposed. It is shown how the expected pure phase modulation response is substantially modified in the following aspects: 1) a coupled amplitude modulation, 2) a non-linear behavior of the phase modulation, 3) some amount of unmodulated light, and 4) a reduction of the effective phase modulation as the angle of incidence increases. Finally, it is shown that multiple reflections can be useful since the effect of a displayed diffraction grating is doubled on a beam that is reflected twice through the LC layer, thus rendering gratings with doubled phase modulation depth.
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei

2007-01-01

We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
40 CFR 370.42 - What is Tier II inventory information?

Code of Federal Regulations, 2014 CFR

2014-07-01

... numbers assigned under the Toxic Release Inventory (TRI) and Risk Management Program. If your facility has... Accident Prevention Provisions, also known as the Risk Management Program. (m) The name, mailing address... year. (s) For each hazardous chemical that you are required to report, you must: (1) Pure Chemical...
Parallel computing for probabilistic fatigue analysis

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

1993-01-01

This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Distributed computing feasibility in a non-dedicated homogeneous distributed system

NASA Technical Reports Server (NTRS)

Leutenegger, Scott T.; Sun, Xian-He

1993-01-01

The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

NASA Technical Reports Server (NTRS)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Estimating vapor pressures of pure liquids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haraburda, S.S.

1996-03-01

Calculating the vapor pressures for pure liquid chemicals is a key step in designing equipment for separation of liquid mixtures. Here is a useful way to develop an equation for predicting vapor pressures over a range of temperatures. The technique uses known vapor pressure points for different temperatures. Although a vapor-pressure equation is being showcased in this article, the basic method has much broader applicability -- in fact, users can apply it to develop equations for any temperature-dependent model. The method can be easily adapted for use in software programs for mathematics evaluation, minimizing the need for any programming. Themore » model used is the Antoine equation, which typically provides a good correlation with experimental or measured data.« less
Nemesis I: Parallel Enhancements to ExodusII

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hennigan, Gary L.; John, Matthew S.; Shadid, John N.

2006-03-28

NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on files which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.
Discrete-continuous variable structural synthesis using dual methods

NASA Technical Reports Server (NTRS)

Schmit, L. A.; Fleury, C.

1980-01-01

Approximation concepts and dual methods are extended to solve structural synthesis problems involving a mix of discrete and continuous sizing type of design variables. Pure discrete and pure continuous variable problems can be handled as special cases. The basic mathematical programming statement of the structural synthesis problem is converted into a sequence of explicit approximate primal problems of separable form. These problems are solved by constructing continuous explicit dual functions, which are maximized subject to simple nonnegativity constraints on the dual variables. A newly devised gradient projection type of algorithm called DUAL 1, which includes special features for handling dual function gradient discontinuities that arise from the discrete primal variables, is used to find the solution of each dual problem. Computational implementation is accomplished by incorporating the DUAL 1 algorithm into the ACCESS 3 program as a new optimizer option. The power of the method set forth is demonstrated by presenting numerical results for several example problems, including a pure discrete variable treatment of a metallic swept wing and a mixed discrete-continuous variable solution for a thin delta wing with fiber composite skins.
High Performance Fortran for Aerospace Applications

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Zima, Hans; Bushnell, Dennis M. (Technical Monitor)

2000-01-01

This paper focuses on the use of High Performance Fortran (HPF) for important classes of algorithms employed in aerospace applications. HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications, while delegating to the compiler/runtime system the task of generating explicitly parallel message-passing programs. We begin by providing a short overview of the HPF language. This is followed by a detailed discussion of the efficient use of HPF for applications involving multiple structured grids such as multiblock and adaptive mesh refinement (AMR) codes as well as unstructured grid codes. We focus on the data structures and computational structures used in these codes and on the high-level strategies that can be expressed in HPF to optimally exploit the parallelism in these algorithms.
The portals 4.0.1 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2013-04-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Merlin - Massively parallel heterogeneous computing

NASA Technical Reports Server (NTRS)

Wittie, Larry; Maples, Creve

1989-01-01

Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.