Science.gov

Sample records for not4p functions parallel

  1. Functional & para-functional parallel processing

    SciTech Connect

    Not Available

    1994-11-01

    For years (about 20, in fact) dataflow researchers have argued for the use of dataflow (a subset of functional) languages for parallel computing, resting their proof on the ability to construct large-scale dataflow machines to realize the inherent parallelism in Functional programs. Unfortunately, such machines have never materialized as commercial products - instead, the market shows a vast variety of parallel multiprocessors that require special skills to program. It may be the case that these machines reflect a wrong direction in computer architecture design, and it may be the case that dataflow machines are the right way to go, but the proof is in the pudding, and thus far there does not exist even a prototype dataflow machine that can prove the {open_quote}dataflow thesis.{close_quote} Under the circumstances it would seem rather foolhardy simply to ignore the commercial parallel machines that are available now, regardless of one`s favorite programming methodology or concurrency model. It has been the authors` thesis that one can in fact use such machines effectively, while maintaining the concomitant thesis that functional programming is good for parallel computation. During the last two years the author has made considerable progress to support this two-fold thesis, and is now prepared to extend this work in several ways. The authors` particular interest, and presumably the primary interest to DOE, is to concentrate the work in the area of scientific computing, including functional language features, program development tools, and systems support tailored for scientific computing applications. The authors` desire to do this reflects confidence that this approach really will work for scientific computing - the author has spent two years proving the viability of the ideas, and now it`s time to put them into action.

  2. Functional MRI Using Regularized Parallel Imaging Acquisition

    PubMed Central

    Lin, Fa-Hsuan; Huang, Teng-Yi; Chen, Nan-Kuei; Wang, Fu-Nien; Stufflebeam, Steven M.; Belliveau, John W.; Wald, Lawrence L.; Kwong, Kenneth K.

    2013-01-01

    Parallel MRI techniques reconstruct full-FOV images from undersampled k-space data by using the uncorrelated information from RF array coil elements. One disadvantage of parallel MRI is that the image signal-to-noise ratio (SNR) is degraded because of the reduced data samples and the spatially correlated nature of multiple RF receivers. Regularization has been proposed to mitigate the SNR loss originating due to the latter reason. Since it is necessary to utilize static prior to regularization, the dynamic contrast-to-noise ratio (CNR) in parallel MRI will be affected. In this paper we investigate the CNR of regularized sensitivity encoding (SENSE) acquisitions. We propose to implement regularized parallel MRI acquisitions in functional MRI (fMRI) experiments by incorporating the prior from combined segmented echo-planar imaging (EPI) acquisition into SENSE reconstructions. We investigated the impact of regularization on the CNR by performing parametric simulations at various BOLD contrasts, acceleration rates, and sizes of the active brain areas. As quantified by receiver operating characteristic (ROC) analysis, the simulations suggest that the detection power of SENSE fMRI can be improved by regularized reconstructions, compared to unregularized reconstructions. Human motor and visual fMRI data acquired at different field strengths and array coils also demonstrate that regularized SENSE improves the detection of functionally active brain regions. PMID:16032694

  3. Highly parallel oligonucleotide purification and functionalization using reversible chemistry

    PubMed Central

    York, Kerri T.; Smith, Ryan C.; Yang, Rob; Melnyk, Peter C.; Wiley, Melissa M.; Turk, Casey M.; Ronaghi, Mostafa; Gunderson, Kevin L.; Steemers, Frank J.

    2012-01-01

    We have developed a cost-effective, highly parallel method for purification and functionalization of 5′-labeled oligonucleotides. The approach is based on 5′-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5′-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90–99% purity. Approaches for using this method in other applications are also discussed. PMID:22039155

  4. Highly parallel oligonucleotide purification and functionalization using reversible chemistry.

    PubMed

    York, Kerri T; Smith, Ryan C; Yang, Rob; Melnyk, Peter C; Wiley, Melissa M; Turk, Casey M; Ronaghi, Mostafa; Gunderson, Kevin L; Steemers, Frank J

    2012-01-01

    We have developed a cost-effective, highly parallel method for purification and functionalization of 5'-labeled oligonucleotides. The approach is based on 5'-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5'-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90-99% purity. Approaches for using this method in other applications are also discussed. PMID:22039155

  5. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  6. Administering truncated receive functions in a parallel messaging interface

    SciTech Connect

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.

  7. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    SciTech Connect

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  8. Parallel functional programming in Sisal: Fictions, facts, and future

    SciTech Connect

    McGraw, J.R.

    1993-07-01

    This paper provides a status report on the progress of research and development on the functional language Sisal. This project focuses on providing a highly effective method of writing large scientific applications that can efficiently execute on a spectrum of different multiprocessors. The paper includes sections on the language definition, compilation strategies, and programming techniques intended for readers with little or no background with Sisal. The section on performance presents our most recent results on execution speed for shared-memory multiprocessors, our findings using Sisal to develop codes, and our experiences migrating the same source code to different machines. For large programs, the execution performance of Sisal (with minimal supporting advice from the programmer) usually exceeds that of the best available automatic, vector/parallel Fortran compilers. Our evidence also indicates that Sisal programs tend to be shorter in length, faster to write, and dearer to understand than equivalent algorithms in Fortran. The paper concludes with a substantial discussion of common criticisms of the language and our plans for addressing them. Most notably, efficient implementations for distributed memory machines are lacking; an issue we plan to remedy.

  9. Functional networks in parallel with cortical development associate with executive functions in children.

    PubMed

    Zhong, Jidan; Rifkin-Graboi, Anne; Ta, Anh Tuan; Yap, Kar Lai; Chuang, Kai-Hsiang; Meaney, Michael J; Qiu, Anqi

    2014-07-01

    Children begin performing similarly to adults on tasks requiring executive functions in late childhood, a transition that is probably due to neuroanatomical fine-tuning processes, including myelination and synaptic pruning. In parallel to such structural changes in neuroanatomical organization, development of functional organization may also be associated with cognitive behaviors in children. We examined 6- to 10-year-old children's cortical thickness, functional organization, and cognitive performance. We used structural magnetic resonance imaging (MRI) to identify areas with cortical thinning, resting-state fMRI to identify functional organization in parallel to cortical development, and working memory/response inhibition tasks to assess executive functioning. We found that neuroanatomical changes in the form of cortical thinning spread over bilateral frontal, parietal, and occipital regions. These regions were engaged in 3 functional networks: sensorimotor and auditory, executive control, and default mode network. Furthermore, we found that working memory and response inhibition only associated with regional functional connectivity, but not topological organization (i.e., local and global efficiency of information transfer) of these functional networks. Interestingly, functional connections associated with "bottom-up" as opposed to "top-down" processing were more clearly related to children's performance on working memory and response inhibition, implying an important role for brain systems involved in late childhood. PMID:23448875

  10. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  11. Efficient time-dependent density functional theory approximations for hybrid density functionals: analytical gradients and parallelization.

    PubMed

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-01

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ~26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ~27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ~24 on 30 processors. The

  12. Efficient time-dependent density functional theory approximations for hybrid density functionals: Analytical gradients and parallelization

    NASA Astrophysics Data System (ADS)

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-01

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ˜26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ˜27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ˜24 on 30 processors. The

  13. Method, systems, and computer program products for implementing function-parallel network firewall

    DOEpatents

    Fulp, Errin W.; Farley, Ryan J.

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  14. Parallel sites implicate functional convergence of the hearing gene prestin among echolocating mammals.

    PubMed

    Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng

    2014-09-01

    Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation. PMID:24951728

  15. Serial and Parallel Attentive Visual Searches: Evidence from Cumulative Distribution Functions of Response Times

    ERIC Educational Resources Information Center

    Sung, Kyongje

    2008-01-01

    Participants searched a visual display for a target among distractors. Each of 3 experiments tested a condition proposed to require attention and for which certain models propose a serial search. Serial versus parallel processing was tested by examining effects on response time means and cumulative distribution functions. In 2 conditions, the…

  16. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    Charon is a software toolkit that enables engineers to develop high-performing message-passing programs in a convenient and piecemeal fashion. Emphasis is on rapid program development and prototyping. In this report a detailed description of the functional design of the toolkit is presented. It is illustrated by the stepwise parallelization of two representative code examples.

  17. Analysis and selection of optimal function implementations in massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  18. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  19. Hyperspectral band selection based on parallel particle swarm optimization and impurity function band prioritization schemes

    NASA Astrophysics Data System (ADS)

    Chang, Yang-Lang; Liu, Jin-Nan; Chen, Yen-Lin; Chang, Wen-Yen; Hsieh, Tung-Ju; Huang, Bormin

    2014-01-01

    In recent years, satellite imaging technologies have resulted in an increased number of bands acquired by hyperspectral sensors, greatly advancing the field of remote sensing. Accordingly, owing to the increasing number of bands, band selection in hyperspectral imagery for dimension reduction is important. This paper presents a framework for band selection in hyperspectral imagery that uses two techniques, referred to as particle swarm optimization (PSO) band selection and the impurity function band prioritization (IFBP) method. With the PSO band selection algorithm, highly correlated bands of hyperspectral imagery can first be grouped into modules to coarsely reduce high-dimensional datasets. Then, these highly correlated band modules are analyzed with the IFBP method to finely select the most important feature bands from the hyperspectral imagery dataset. However, PSO band selection is a time-consuming procedure when the number of hyperspectral bands is very large. Hence, this paper proposes a parallel computing version of PSO, namely parallel PSO (PPSO), using a modern graphics processing unit (GPU) architecture with NVIDIA's compute unified device architecture technology to improve the computational speed of PSO processes. The natural parallelism of the proposed PPSO lies in the fact that each particle can be regarded as an independent agent. Parallel computation benefits the algorithm by providing each agent with a parallel processor. The intrinsic parallel characteristics embedded in PPSO are, therefore, suitable for parallel computation. The effectiveness of the proposed PPSO is evaluated through the use of airborne visible/infrared imaging spectrometer hyperspectral images. The performance of PPSO is validated using the supervised K-nearest neighbor classifier. The experimental results demonstrate that the proposed PPSO/IFBP band selection method can not only improve computational speed, but also offer a satisfactory classification performance.

  20. Storing files in a parallel computing system based on user-specified parser function

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  1. Optimization of a parallel permutation testing function for the SPRINT R package.

    PubMed

    Petrou, Savvas; Sloan, Terence M; Mewissen, Muriel; Forster, Thorsten; Piotrowski, Michal; Dobrzelecki, Bartosz; Ghazal, Peter; Trew, Arthur; Hill, Jon

    2011-12-10

    The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright © 2011 John Wiley & Sons, Ltd. PMID:23335858

  2. Structure-Function Relationships of Postnatal Tendon Development: A Parallel to Healing

    PubMed Central

    Connizzo, Brianne K.; Yannascoli, Sarah M.; Soslowsky, Louis J.

    2013-01-01

    This review highlights recent research on structure-function relationships in tendon and comments on the parallels between development and healing. The processes of tendon development and collagen fibrillogenesis are reviewed, but due to the abundance of information in this field, this work focuses primarily on characterizing the mechanical behavior of mature and developing tendon, and how the latter parallels healing tendon. The role that extracellular matrix components, mainly collagen, proteoglycans, and collagen cross-links, play in determining the mechanical behavior of tendon will be examined in this review. Specifically, collagen fiber re-alignment and collagen fibril uncrimping relate mechanical behavior to structural alterations during development and during healing. Finally, attention is paid to a number of recent efforts to augment injured tendon and how future efforts could focus on recreating the important structure-function relationships reviewed here. PMID:23357642

  3. Optimization of a parallel permutation testing function for the SPRINT R package

    PubMed Central

    Petrou, Savvas; Sloan, Terence M; Mewissen, Muriel; Forster, Thorsten; Piotrowski, Michal; Dobrzelecki, Bartosz; Ghazal, Peter; Trew, Arthur; Hill, Jon

    2011-01-01

    The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright © 2011 John Wiley & Sons, Ltd. PMID:23335858

  4. Micro/Nanoscale Parallel Patterning of Functional Biomolecules, Organic Fluorophores and Colloidal Nanocrystals

    NASA Astrophysics Data System (ADS)

    Sabella, S.; Brunetti, V.; Vecchio, G.; Torre, A. Della; Rinaldi, R.; Cingolani, R.; Pompa, P. P.

    2009-10-01

    We describe the design and optimization of a reliable strategy that combines self-assembly and lithographic techniques, leading to very precise micro-/nanopositioning of biomolecules for the realization of micro- and nanoarrays of functional DNA and antibodies. Moreover, based on the covalent immobilization of stable and versatile SAMs of programmable chemical reactivity, this approach constitutes a general platform for the parallel site-specific deposition of a wide range of molecules such as organic fluorophores and water-soluble colloidal nanocrystals.

  5. Parallel functional category deficits in clauses and nominal phrases: The case of English agrammatism

    PubMed Central

    Wang, Honglei; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Individuals with agrammatic aphasia exhibit restricted patterns of impairment of functional morphemes, however, syntactic characterization of the impairment is controversial. Previous studies have focused on functional morphology in clauses only. This study extends the empirical domain by testing functional morphemes in English nominal phrases in aphasia and comparing patients’ impairment to their impairment of functional morphemes in English clauses. In the linguistics literature, it is assumed that clauses and nominal phrases are structurally parallel but exhibit inflectional differences. The results of the present study indicated that aphasic speakers evinced similar impairment patterns in clauses and nominal phrases. These findings are consistent with the Distributed Morphology Hypothesis (DMH), suggesting that the source of functional morphology deficits among agrammatics relates to difficulty implementing rules that convert inflectional features into morphemes. Our findings, however, are inconsistent with the Tree Pruning Hypothesis (TPH), which suggests that patients have difficulty building complex hierarchical structures. PMID:26379370

  6. A cost-effective methodology for the design of massively-parallel VLSI functional units

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  7. Parallel fixed point implementation of a radial basis function network in an FPGA.

    PubMed

    de Souza, Alisson C D; Fernandes, Marcelo A C

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. PMID:25268918

  8. Parallel Fixed Point Implementation of a Radial Basis Function Network in an FPGA

    PubMed Central

    de Souza, Alisson C. D.; Fernandes, Marcelo A. C.

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. PMID:25268918

  9. Electromagnetic semitransparent δ-function plate: Casimir interaction energy between parallel infinitesimally thin plates

    NASA Astrophysics Data System (ADS)

    Parashar, Prachi; Milton, Kimball A.; Shajesh, K. V.; Schaden, M.

    2012-10-01

    We derive boundary conditions for electromagnetic fields on a δ-function plate. The optical properties of such a plate are shown to necessarily be anisotropic in that they only depend on the transverse properties of the plate. We unambiguously obtain the boundary conditions for a perfectly conducting δ-function plate in the limit of infinite dielectric response. We show that a material does not “optically vanish” in the thin-plate limit. The thin-plate limit of a plasma slab of thickness d with plasma frequency ωp2=ζp/d reduces to a δ-function plate for frequencies (ω=iζ) satisfying ζd≪ζpd≪1. We show that the Casimir interaction energy between two parallel perfectly conducting δ-function plates is the same as that for parallel perfectly conducting slabs. Similarly, we show that the interaction energy between an atom and a perfect electrically conducting δ-function plate is the usual Casimir-Polder energy, which is verified by considering the thin-plate limit of dielectric slabs. The “thick” and “thin” boundary conditions considered by Bordag are found to be identical in the sense that they lead to the same electromagnetic fields.

  10. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  11. WSINV3DMT: Vertical magnetic field transfer function inversion and parallel implementation

    NASA Astrophysics Data System (ADS)

    Siripunvaraporn, Weerachai; Egbert, Gary

    2009-04-01

    We describe two extensions to the three-dimensional magnetotelluric inversion program WSINV3DMT (Siripunvaraporn, W., Egbert, G., Lenbury, Y., Uyeshima, M., 2005, Three-dimensional magnetotelluric inversion: data-space method. Phys. Earth Planet. Interiors 150, 3-14), including modifications to allow inversion of the vertical magnetic transfer functions (VTFs), and parallelization of the code. The parallel implementation, which is most appropriate for small clusters, uses MPI to distribute forward solutions for different frequencies, as well as some linear algebraic computations, over multiple processors. In addition to reducing run times, the parallelization reduces memory requirements by distributing storage of the sensitivity matrix. Both new features are tested on synthetic and real datasets, revealing nearly linear speedup for a small number of processors (up to 8). Experiments on synthetic examples show that the horizontal position and lateral conductivity contrasts of anomalies can be recovered by inverting VTFs alone. However, vertical positions and absolute amplitudes are not well constrained unless an accurate host resistivity is imposed a priori. On very simple synthetic models including VTFs in a joint inversion had little impact on the inverse solution computed with impedances alone. However, in experiments with real data, inverse solutions obtained from joint inversion of VTF and impedances, and from impedances alone, differed in important ways, suggesting that for structures with more realistic levels of complexity the VTFs will in general provide useful additional constraints.

  12. Finding zeros of nonlinear functions using the hybrid parallel cell mapping method

    NASA Astrophysics Data System (ADS)

    Xiong, Fu-Rui; Schütze, Oliver; Ding, Qian; Sun, Jian-Qiao

    2016-05-01

    Analysis of nonlinear dynamical systems including finding equilibrium states and stability boundaries often leads to a problem of finding zeros of vector functions. However, finding all the zeros of a set of vector functions in the domain of interest is quite a challenging task. This paper proposes a zero finding algorithm that combines the cell mapping methods and the subdivision techniques. Both the simple cell mapping (SCM) and generalized cell mapping (GCM) methods are used to identify a covering set of zeros. The subdivision technique is applied to enhance the solution resolution. The parallel implementation of the proposed method is discussed extensively. Several examples are presented to demonstrate the application and effectiveness of the proposed method. We then extend the study of finding zeros to the problem of finding stability boundaries of potential fields. Examples of two and three dimensional potential fields are studied. In addition to the effectiveness in finding the stability boundaries, the proposed method can handle several millions of cells in just a few seconds with the help of parallel computing in graphics processing units (GPUs).

  13. Line-field parallel swept source MHz OCT for structural and functional retinal imaging.

    PubMed

    Fechtig, Daniel J; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M; Drexler, Wolfgang; Leitgeb, Rainer A

    2015-03-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation. PMID:25798298

  14. Line-field parallel swept source MHz OCT for structural and functional retinal imaging

    PubMed Central

    Fechtig, Daniel J.; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M.; Drexler, Wolfgang; Leitgeb, Rainer A.

    2015-01-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation. PMID:25798298

  15. Free minimization of the fundamental measure theory functional: Freezing of parallel hard squares and cubes.

    PubMed

    Belli, S; Dijkstra, M; van Roij, R

    2012-09-28

    Due to remarkable advances in colloid synthesis techniques, systems of squares and cubes, once an academic abstraction for theorists and simulators, are nowadays an experimental reality. By means of a free minimization of the free-energy functional, we apply fundamental measure theory to analyze the phase behavior of parallel hard squares and hard cubes. We compare our results with those obtained by the traditional approach based on the Gaussian parameterization, finding small deviations and good overall agreement between the two methods. For hard squares, our predictions feature at intermediate packing fraction a smectic phase, which is however expected to be unstable due to thermal fluctuations. Due to this inconsistency, we cannot determine unambiguously the prediction of the theory for the expected fluid-to-crystal transition of parallel hard squares, but we deduce two alternative scenarios: (i) a second-order transition with a coexisting vacancy-rich crystal or (ii) a higher-density first-order transition with a coexisting crystal characterized by a lower vacancy concentration. In accordance with previous studies, a second-order transition with a high vacancy concentration is predicted for hard cubes. PMID:23020342

  16. Implementation of the Turn Function Method in a three-dimensional, parallelized hydrodynamics code

    NASA Astrophysics Data System (ADS)

    Orourke, P. J.; Fairfield, M. S.

    1992-08-01

    The implementation of the Turn Function Method in KIVA-F90, a version of the KIVA computer program written in the FORTRAN 90 programming language that is used on some massively parallel computers is described. The Turn Function Method solves both linear momentum and vorticity equations in numerical calculations of compressible fluid flow. Solving a vorticity equation allows vorticity to be both conserved and transported more accurately than in traditional methods for computing compressible flow. This first implementation of the Turn Function Method in a three-dimensional hydrodynamics code involved some modification of the original method and some numerical difference approximations. In particular, a penalty method is used to keep the divergence of the computed vorticity field close to zero. Difference operators are also defined in such a way that the finite difference analog of del(del x u) = 0 is exactly satisfied. Three example problems show the increased computational cost and the accuracy to be gained by using the Turn Function Method in calculations of flows with rotational motion. Use of the Method can increase by 60 percent the computational times of the Euler equation solver in KIVA-F90, but it is concluded that this increased cost is justified by the increased accuracy.

  17. Adaptation to warmer climates by parallel functional evolution of CBF genes in Arabidopsis thaliana.

    PubMed

    Monroe, J Grey; McGovern, Cullen; Lasky, Jesse R; Grogan, Kelsi; Beck, James; McKay, John K

    2016-08-01

    The evolutionary processes and genetics underlying local adaptation at a specieswide level are largely unknown. Recent work has indicated that a frameshift mutation in a member of a family of transcription factors, C-repeat binding factors or CBFs, underlies local adaptation and freezing tolerance divergence between two European populations of Arabidopsis thaliana. To ask whether the specieswide evolution of CBF genes in Arabidopsis is consistent with local adaptation, we surveyed CBF variation from 477 wild accessions collected across the species' range. We found that CBF sequence variation is strongly associated with winter temperature variables. Looking specifically at the minimum temperature experienced during the coldest month, we found that Arabidopsis from warmer climates exhibit a significant excess of nonsynonymous polymorphisms in CBF genes and revealed a CBF haplotype network whose structure points to multiple independent transitions to warmer climates. We also identified a number of newly described mutations of significant functional effect in CBF genes, similar to the frameshift mutation previously indicated to be locally adaptive in Italy, and find that they are significantly associated with warm winters. Lastly, we uncover relationships between climate and the position of significant functional effect mutations between and within CBF paralogs, suggesting variation in adaptive function of different mutations. Cumulatively, these findings support the hypothesis that disruption of CBF gene function is adaptive in warmer climates, and illustrate how parallel evolution in a transcription factor can underlie adaptation to climate. PMID:27247130

  18. Depth estimation via parallel coevolution of disparity functions for area-based stereo

    NASA Astrophysics Data System (ADS)

    Liatsis, Panos; Goulermas, John Y.

    2001-02-01

    12 A novel system for depth estimation is proposed with the use of Symbiotic Genetic Algorithms for the continuous problem of disparity surface approximation. The approach is based on the decomposition of the entire surface to very small non- overlapping patches described by low order bivariate polynomials and the use of symbiotic optimization to enforce smoothness at the boundaries of these patches, so that the entire surface can be approximated in a smooth piecewise fashion by functionals of local support. Such optimization is amenable to a massive parallel implementation, since each patch is optimized by a different execution unit and each unit communicates through its cost function only with its four-connected neighbors. The method makes use of various existing crossover and mutation schemes for real-valued chromosome representations and a new problem-specific mechanism for generating and hybridizing the initial populations. The proposed multi-objective cost function enforces photometric similarity and smoothness between the patch boundaries at a local scale, which in the long term give rise to a globally smooth disparity surface.

  19. Leukocytosis and natural killer cell function parallel neurobehavioral fatigue induced by 64 hours of sleep deprivation.

    PubMed Central

    Dinges, D F; Douglas, S D; Zaugg, L; Campbell, D E; McMann, J M; Whitehouse, W G; Orne, E C; Kapoor, S C; Icaza, E; Orne, M T

    1994-01-01

    The hypothesis that sleep deprivation depresses immune function was tested in 20 adults, selected on the basis of their normal blood chemistry, monitored in a laboratory for 7 d, and kept awake for 64 h. At 2200 h each day measurements were taken of total leukocytes (WBC), monocytes, granulocytes, lymphocytes, eosinophils, erythrocytes (RBC), B and T lymphocyte subsets, activated T cells, and natural killer (NK) subpopulations (CD56/CD8 dual-positive cells, CD16-positive cells, CD57-positive cells). Functional tests included NK cytotoxicity, lymphocyte stimulation with mitogens, and DNA analysis of cell cycle. Sleep loss was associated with leukocytosis and increased NK cell activity. At the maximum sleep deprivation, increases were observed in counts of WBC, granulocytes, monocytes, NK activity, and the proportion of lymphocytes in the S phase of the cell cycle. Changes in monocyte counts correlated with changes in other immune parameters. Counts of CD4, CD16, CD56, and CD57 lymphocytes declined after one night without sleep, whereas CD56 and CD57 counts increased after two nights. No changes were observed in other lymphocyte counts, in proliferative responses to mitogens, or in plasma levels of cortisol or adrenocorticotropin hormone. The physiologic leukocytosis and NK activity increases during deprivation were eliminated by recovery sleep in a manner parallel to neurobehavioral function, suggesting that the immune alterations may be associated with biological pressure for sleep. PMID:7910171

  20. Leukocytosis and natural killer cell function parallel neurobehavioral fatigue induced by 64 hours of sleep deprivation.

    PubMed

    Dinges, D F; Douglas, S D; Zaugg, L; Campbell, D E; McMann, J M; Whitehouse, W G; Orne, E C; Kapoor, S C; Icaza, E; Orne, M T

    1994-05-01

    The hypothesis that sleep deprivation depresses immune function was tested in 20 adults, selected on the basis of their normal blood chemistry, monitored in a laboratory for 7 d, and kept awake for 64 h. At 2200 h each day measurements were taken of total leukocytes (WBC), monocytes, granulocytes, lymphocytes, eosinophils, erythrocytes (RBC), B and T lymphocyte subsets, activated T cells, and natural killer (NK) subpopulations (CD56/CD8 dual-positive cells, CD16-positive cells, CD57-positive cells). Functional tests included NK cytotoxicity, lymphocyte stimulation with mitogens, and DNA analysis of cell cycle. Sleep loss was associated with leukocytosis and increased NK cell activity. At the maximum sleep deprivation, increases were observed in counts of WBC, granulocytes, monocytes, NK activity, and the proportion of lymphocytes in the S phase of the cell cycle. Changes in monocyte counts correlated with changes in other immune parameters. Counts of CD4, CD16, CD56, and CD57 lymphocytes declined after one night without sleep, whereas CD56 and CD57 counts increased after two nights. No changes were observed in other lymphocyte counts, in proliferative responses to mitogens, or in plasma levels of cortisol or adrenocorticotropin hormone. The physiologic leukocytosis and NK activity increases during deprivation were eliminated by recovery sleep in a manner parallel to neurobehavioral function, suggesting that the immune alterations may be associated with biological pressure for sleep. PMID:7910171

  1. Parallel Loss-of-Function at the RPM1 Bacterial Resistance Locus in Arabidopsis thaliana

    PubMed Central

    Rose, Laura; Atwell, Susanna; Grant, Murray; Holub, Eric B.

    2012-01-01

    Dimorphism at the Resistance to Pseudomonas syringae pv. maculicola 1 (RPM1) locus is well documented in natural populations of Arabidopsis thaliana and has been portrayed as a long-term balanced polymorphism. The haplotype from resistant plants contains the RPM1 gene, which enables these plants to recognize at least two structurally unrelated bacterial effector proteins (AvrB and AvrRpm1) from bacterial crop pathogens. A complete deletion of the RPM1 coding sequence has been interpreted as a single event resulting in susceptibility in these individuals. Consequently, the ability to revert to resistance or for alternative R-gene specificities to evolve at this locus has also been lost in these individuals. Our survey of variation at the RPM1 locus in a large species-wide sample of A. thaliana has revealed four new loss-of-function alleles that contain most of the intervening sequence of the RPM1 open reading frame. Multiple loss-of-function alleles may have originated due to the reported intrinsic cost to plants expressing the RPM1 protein. The frequency and geographic distribution of rpm1 alleles observed in our survey indicate the parallel origin and maintenance of these loss-of-function mutations and reveal a more complex history of natural selection at this locus than previously thought. PMID:23272006

  2. Saccharomyces cerevisiae MPT5 and SSD1 function in parallel pathways to promote cell wall integrity.

    PubMed Central

    Kaeberlein, Matt; Guarente, Leonard

    2002-01-01

    Yeast MPT5 (UTH4) is a limiting component for longevity. We show here that MPT5 also functions to promote cell wall integrity. Loss of Mpt5p results in phenotypes associated with a weakened cell wall, including sorbitol-remedial temperature sensitivity and sensitivities to calcofluor white and sodium dodecyl sulfate. Additionally, we find that mutation of MPT5, in the absence of SSD1-V, is lethal in combination with loss of either Ccr4p or Swi4p. These synthetic lethal interactions are suppressed by the SSD1-V allele. Furthermore, we have provided evidence that the short life span caused by loss of Mpt5p is due to a weakened cell wall. This cell wall defect may be the result of abnormal chitin biosynthesis or accumulation. These analyses have defined three genetic pathways that function in parallel to promote cell integrity: an Mpt5p-containing pathway, an Ssd1p-containing pathway, and a Pkc1p-dependent pathway. This work also provides evidence that post-transcriptional regulation is likely to be important both for maintaining cell integrity and for promoting longevity. PMID:11805047

  3. Instantaneous, parallel mapping of protein electronic function with angle-resolved coherent wave-mixing

    NASA Astrophysics Data System (ADS)

    Mercer, Ian

    2010-03-01

    We present a novel laser method, angle-resolved coherent (ARC) wave-mixing, that separates out coherent electronic couplings from energy transfers in an instantaneous two-dimensional mapping (Ian P. Mercer et.al., Phys. Rev. Lett. 102, 57402, 2009). For this we use an ultra-broadband hollow fibre laser source. The power of the new method is demonstrated with the light harvesting complex II (LH2) of purple bacteria at ambient temperature. We observe signaturs of a coherent quantum electronic beating, a correlation between excitation and emission energies in the protein and a coherent component to the energy transfer between molecular rings. We are interested in exploring avenues for high throughput fingerprinting of molecular structure and function. Massively parallel maps, rich in detail, can be taken from solutions, surface films or solids of between 1 and 1000 microL. Each ARC map is generated instantaneously, with high throughput (currently up to 1kHz frame rate) and is noninvasive.

  4. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    PubMed

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E; Muñoz-Fuentes, Violeta; Green, Andy J; Kopuchian, Cecilia; Tubaro, Pablo L; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M; Wilson, Robert E; Fago, Angela; McCracken, Kevin G; Storz, Jay F

    2015-12-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level. PMID:26637114

  5. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level

    PubMed Central

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E.; Muñoz-Fuentes, Violeta; Green, Andy J.; Kopuchian, Cecilia; Tubaro, Pablo L.; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M.; Wilson, Robert E.; Fago, Angela; McCracken, Kevin G.; Storz, Jay F.

    2015-01-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level. PMID:26637114

  6. High-throughput optogenetic functional magnetic resonance imaging with parallel computations

    PubMed Central

    Fang, Zhongnan; Lee, Jin Hyung

    2013-01-01

    Optogenetic functional magnetic resonance imaging (ofMRI) technology enables cell-type specific, temporally precise neuronal control and accurate, in vivo readout of resulting activity across the whole brain. With the ability to precisely control excitation and inhibition parameters, and to accurately record the resulting activity, there is an increased need for a high-throughput method to bring the ofMRI studies to their full potential. In this paper, an advanced system that can allow real-time fMRI with interactive control and analysis in a fraction of the MRI acquisition repetition time (TR) is proposed. With such high processing speed, sufficient time will be available for integration of future developments that can further enhance ofMRI data quality or better streamline the study. We designed and implemented a highly optimized, massively parallel system using graphics processing unit (GPU)s which achieves reconstruction, motion correction, and analysis of 3D volume data in approximately 12.80 ms. As a result, with a 750 ms TR and 4 interleaf fMRI acquisition, we can now conduct sliding window reconstruction, motion correction, analysis and display in approximately 1.7% of the TR. Therefore, a significant amount of time can now be allocated to integrating advanced but computationally intensive methods that can enable higher image quality and better analysis results all within a TR. Utilizing the proposed high-throughput imaging platform with sliding window reconstruction, we were also able to observe the much-debated initial dips in our ofMRI data. Combined with methods to further improve SNR, the proposed system will enable efficient real-time, interactive, high-throughput ofMRI studies. PMID:23747482

  7. Energy distribution functions of kilovolt ions parallel and perpendicular to the magnetic field of a modified Penning discharge

    NASA Technical Reports Server (NTRS)

    Roth, R. J.

    1973-01-01

    The distribution function of ion energy parallel to the magnetic field of a modified Penning discharge has been measured with a retarding potential energy analyzer. These ions escaped through one of the throats of the magnetic mirror geometry. Simultaneous measurements of the ion energy distribution function perpendicular to the magnetic field have been made with a charge exchange neutral detector. The ion energy distribution functions are approximately Maxwellian, and the parallel and perpendicular kinetic temperatures are equal within experimental error. These results suggest that turbulent processes previously observed in this discharge Maxwellianize the velocity distribution along a radius in velocity space and cause an isotropic energy distribution. When the distributions depart from Maxwellian, they are enhanced above the Maxwellian tail.

  8. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.

  9. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    NASA Technical Reports Server (NTRS)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  10. Parallel algorithms and architectures

    SciTech Connect

    Albrecht, A.; Jung, H.; Mehlhorn, K.

    1987-01-01

    Contents of this book are the following: Preparata: Deterministic simulation of idealized parallel computers on more realistic ones; Convex hull of randomly chosen points from a polytope; Dataflow computing; Parallel in sequence; Towards the architecture of an elementary cortical processor; Parallel algorithms and static analysis of parallel programs; Parallel processing of combinatorial search; Communications; An O(nlogn) cost parallel algorithms for the single function coarsest partition problem; Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region; and RELACS - A recursive layout computing system. Parallel linear conflict-free subtree access.

  11. Memorability of commands learned as keywords or function keys: A parallel to voice recognition interfaces

    SciTech Connect

    Sorn, K.; Schultz, E.E. Jr.

    1987-09-15

    Voice recognition interfaces require users to input keywords to access and control functions. An experiment was conducted to compare user's memory for keywords relative to names of equivalent function keys. Thirty-five subjects attempted to learn word processing functions as keywords or in terms of the names of function keys which allowed access to and control of these functions. Keyword learning produced a significantly higher proportion of correct recalls and fewer intrusions (false recalls) after both an immediate retention test and an unexpected second test two weeks later. Superior keyword memorability is an important potential advantage of voice recognition interfaces.

  12. Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

    NASA Technical Reports Server (NTRS)

    Wheatley, Thomas E.; Michaloski, John L.; Lumia, Ronald

    1989-01-01

    Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored.

  13. Morphological Functions with Parallel Sets for the Pore Space of X-ray CT Images of Soil Columns

    NASA Astrophysics Data System (ADS)

    San José Martínez, F.; Muñoz Ortega, F. J.; Caniego Monreal, F. J.; Peregrina, F.

    2016-03-01

    During the last few decades, new imaging techniques like X-ray computed tomography have made available rich and detailed information of the spatial arrangement of soil constituents, usually referred to as soil structure. Mathematical morphology provides a plethora of mathematical techniques to analyze and parameterize the geometry of soil structure. They provide a guide to design the process from image analysis to the generation of synthetic models of soil structure in order to investigate key features of flow and transport phenomena in soil. In this work, we explore the ability of morphological functions built over Minkowski functionals with parallel sets of the pore space to characterize and quantify pore space geometry of columns of intact soil. These morphological functions seem to discriminate the effects on soil pore space geometry of contrasting management practices in a Mediterranean vineyard, and they provide the first step toward identifying the statistical significance of the observed differences.

  14. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers

    PubMed Central

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Virta, Marko PJuhani; Alm, Eric J

    2016-01-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells. PMID:26394010

  15. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers.

    PubMed

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Juhani Virta, Marko P; Alm, Eric J

    2016-02-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells. PMID:26394010

  16. A coarse-grained model for DNA-functionalized spherical colloids, revisited: effective pair potential from parallel replica simulations.

    PubMed

    Theodorakis, Panagiotis E; Dellago, Christoph; Kahl, Gerhard

    2013-01-14

    We discuss a coarse-grained model recently proposed by Starr and Sciortino [J. Phys.: Condens. Matter 18, L347 (2006)] for spherical particles functionalized with short single DNA strands. The model incorporates two key aspects of DNA hybridization, i.e., the specificity of binding between DNA bases and the strong directionality of hydrogen bonds. Here, we calculate the effective potential between two DNA-functionalized particles of equal size using a parallel replica protocol. We find that the transition from bonded to unbonded configurations takes place at considerably lower temperatures compared to those that were originally predicted using standard simulations in the canonical ensemble. We put particular focus on DNA-decorations of tetrahedral and octahedral symmetry, as they are promising candidates for the self-assembly into a single-component diamond structure. Increasing colloid size hinders hybridization of the DNA strands, in agreement with experimental findings. PMID:23320725

  17. Parallel Changes in Structural and Functional Measures of Optic Nerve Myelination after Optic Neuritis

    PubMed Central

    van der Walt, Anneke; Kolbe, Scott; Mitchell, Peter; Wang, Yejun; Butzkueven, Helmut; Egan, Gary; Yiannikas, Con; Graham, Stuart; Kilpatrick, Trevor; Klistorner, Alexander

    2015-01-01

    Introduction Visual evoked potential (VEP) latency prolongation and optic nerve lesion length after acute optic neuritis (ON) corresponds to the degree of demyelination, while subsequent recovery of latency may represent optic nerve remyelination. We aimed to investigate the relationship between multifocal VEP (mfVEP) latency and optic nerve lesion length after acute ON. Methods Thirty acute ON patients were studied at 1,3,6 and 12 months using mfVEP and at 1 and 12 months with optic nerve MRI. LogMAR and low contrast visual acuity were documented. By one month, the mfVEP amplitude had recovered sufficiently for latency to be measured in 23 (76.7%) patients with seven patients having no recordable mfVEP in more than 66% of segments in at least one test. Only data from these 23 patients was analysed further. Results Both latency and lesion length showed significant recovery during the follow-up period. Lesion length and mfVEP latency were highly correlated at 1 (r = 0.94, p = <0.0001) and 12 months (r = 0.75, p < 0.001). Both measures demonstrated a similar trend of recovery. Speed of latency recovery was faster in the early follow-up period while lesion length shortening remained relatively constant. At 1 month, latency delay was worse by 1.76ms for additional 1mm of lesion length while at 12 months, 1mm of lesion length accounted for 1.94ms of latency delay. Conclusion A strong association between two putative measures of demyelination in early and chronic ON was found. Parallel recovery of both measures could reflect optic nerve remyelination. PMID:26020925

  18. The functional significance of cortical reorganization and the parallel development of CI therapy.

    PubMed

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  19. The functional significance of cortical reorganization and the parallel development of CI therapy

    PubMed Central

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W.

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  20. The functional and anatomical organization of marsupial neocortex: Evidence for parallel evolution across mammals

    PubMed Central

    Karlen, Sarah J.; Krubitzer, Leah

    2007-01-01

    Marsupials are a diverse group of mammals that occupy a large range of habitats and have evolved a wide array of unique adaptations. Although they are as diverse as placental mammals, our understanding of marsupial brain organization is more limited. Like placental mammals, marsupials have striking similarities in neocortical organization, such as a constellation of cortical fields including S1, S2, V1, V2, and A1, that are functionally, architectonically, and connectionally distinct. In this review, we describe the general lifestyle and morphological characteristics of all marsupials and the organization of somatosensory, motor, visual, and auditory cortex. For each sensory system, we compare the functional organization and the corticocortical and thalamocortical connections of the neocortex across species. Differences between placental and marsupial species are discussed and the theories on neocortical evolution that have been derived from studying marsupials, particularly the idea of a sensorimotor amalgam, are evaluated. Overall, marsupials inhabit a variety of niches and assume many different lifestyles. For example, marsupials occupy terrestrial, arboreal, burrowing, and aquatic environments; some animals are highly social while others are solitary; and different species are carnivorous, herbivorous, or omnivorous. For each of these adaptations, marsupials have evolved an array of morphological, behavioral, and cortical specializations that are strikingly similar to those observed in placental mammals occupying similar habitats, which indicate that there are constraints imposed on evolving nervous systems that result in recurrent solutions to similar environmental challenges. PMID:17507143

  1. Predictive biomarker discovery through the parallel integration of clinical trial and functional genomics datasets.

    PubMed

    Swanton, Charles; Larkin, James M; Gerlinger, Marco; Eklund, Aron C; Howell, Michael; Stamp, Gordon; Downward, Julian; Gore, Martin; Futreal, P Andrew; Escudier, Bernard; Andre, Fabrice; Albiges, Laurence; Beuselinck, Benoit; Oudard, Stephane; Hoffmann, Jens; Gyorffy, Balázs; Torrance, Chris J; Boehme, Karen A; Volkmer, Hansjuergen; Toschi, Luisella; Nicke, Barbara; Beck, Marlene; Szallasi, Zoltan

    2010-01-01

    The European Union multi-disciplinary Personalised RNA interference to Enhance the Delivery of Individualised Cytotoxic and Targeted therapeutics (PREDICT) consortium has recently initiated a framework to accelerate the development of predictive biomarkers of individual patient response to anti-cancer agents. The consortium focuses on the identification of reliable predictive biomarkers to approved agents with anti-angiogenic activity for which no reliable predictive biomarkers exist: sunitinib, a multi-targeted tyrosine kinase inhibitor and everolimus, a mammalian target of rapamycin (mTOR) pathway inhibitor. Through the analysis of tumor tissue derived from pre-operative renal cell carcinoma (RCC) clinical trials, the PREDICT consortium will use established and novel methods to integrate comprehensive tumor-derived genomic data with personalized tumor-derived small hairpin RNA and high-throughput small interfering RNA screens to identify and validate functionally important genomic or transcriptomic predictive biomarkers of individual drug response in patients. PREDICT's approach to predictive biomarker discovery differs from conventional associative learning approaches, which can be susceptible to the detection of chance associations that lead to overestimation of true clinical accuracy. These methods will identify molecular pathways important for survival and growth of RCC cells and particular targets suitable for therapeutic development. Importantly, our results may enable individualized treatment of RCC, reducing ineffective therapy in drug-resistant disease, leading to improved quality of life and higher cost efficiency, which in turn should broaden patient access to beneficial therapeutics, thereby enhancing clinical outcome and cancer survival. The consortium will also establish and consolidate a European network providing the technological and clinical platform for large-scale functional genomic biomarker discovery. Here we review our current understanding

  2. Parallel blind deconvolution of astronomical images based on the fractal energy ratio of the image and regularization of the point spread function

    NASA Astrophysics Data System (ADS)

    Jia, Peng; Cai, Dongmei; Wang, Dong

    2014-11-01

    A parallel blind deconvolution algorithm is presented. The algorithm contains the constraints of the point spread function (PSF) derived from the physical process of the imaging. Additionally, in order to obtain an effective restored image, the fractal energy ratio is used as an evaluation criterion to estimate the quality of the image. This algorithm is fine-grained parallelized to increase the calculation speed. Results of numerical experiments and real experiments indicate that this algorithm is effective.

  3. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    PubMed Central

    Gudino, N.; Sonmez, M.; Yao, Z.; Baig, T.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Martens, M.; Balaban, R. S.; Hansen, M. S.; Griswold, M. A.

    2015-01-01

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B1) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  4. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    SciTech Connect

    Gudino, N.; Sonmez, M.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Balaban, R. S.; Hansen, M. S.; Yao, Z.; Baig, T.; Martens, M.; Griswold, M. A.

    2015-01-15

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B{sub 1}) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  5. EUPDF: Eulerian Monte Carlo Probability Density Function Solver for Applications With Parallel Computing, Unstructured Grids, and Sprays

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    1998-01-01

    The success of any solution methodology used in the study of gas-turbine combustor flows depends a great deal on how well it can model the various complex and rate controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as convective and radiative heat transfer and other phenomena. The phenomena to be modeled, which are controlled by these processes, often strongly interact with each other at different times and locations. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and evaporation in many practical combustion devices. The influence of turbulence in a diffusion flame manifests itself in several forms, ranging from the so-called wrinkled, or stretched, flamelets regime to the distributed combustion regime, depending upon how turbulence interacts with various flame scales. Conventional turbulence models have difficulty treating highly nonlinear reaction rates. A solution procedure based on the composition joint probability density function (PDF) approach holds the promise of modeling various important combustion phenomena relevant to practical combustion devices (such as extinction, blowoff limits, and emissions predictions) because it can account for nonlinear chemical reaction rates without making approximations. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on the PDF method to unstructured grids, parallel computing, and sprays. EUPDF, which was developed by M.S. Raju of Nyma, Inc., was designed to be massively parallel and could easily be coupled with any existing gas-phase and/or spray solvers. EUPDF can use an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements. The application of the PDF method showed favorable results when applied to several supersonic

  6. Eclipse Parallel Tools Platform

    SciTech Connect

    Watson, Gregory; DeBardeleben, Nathan; Rasmussen, Craig

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices, and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures, and basis

  7. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    NASA Astrophysics Data System (ADS)

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips.

  8. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication.

    PubMed

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given 'Y' shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  9. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    PubMed Central

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  10. A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

    PubMed

    Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

    2014-06-10

    We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion. PMID:26580754

  11. Parallel functional activity profiling reveals valvulopathogens are potent 5-hydroxytryptamine(2B) receptor agonists: implications for drug safety assessment.

    PubMed

    Huang, Xi-Ping; Setola, Vincent; Yadav, Prem N; Allen, John A; Rogan, Sarah C; Hanson, Bonnie J; Revankar, Chetana; Robers, Matt; Doucette, Chris; Roth, Bryan L

    2009-10-01

    Drug-induced valvular heart disease (VHD) is a serious side effect of a few medications, including some that are on the market. Pharmacological studies of VHD-associated medications (e.g., fenfluramine, pergolide, methysergide, and cabergoline) have revealed that they and/or their metabolites are potent 5-hydroxytryptamine(2B) (5-HT(2B)) receptor agonists. We have shown that activation of 5-HT(2B) receptors on human heart valve interstitial cells in vitro induces a proliferative response reminiscent of the fibrosis that typifies VHD. To identify current or future drugs that might induce VHD, we screened approximately 2200 U.S. Food and Drug Administration (FDA)-approved or investigational medications to identify 5-HT(2B) receptor agonists, using calcium-based high-throughput screening. Of these 2200 compounds, 27 were 5-HT(2B) receptor agonists (hits); 14 of these had previously been identified as 5-HT(2B) receptor agonists, including seven bona fide valvulopathogens. Six of the hits (guanfacine, quinidine, xylometazoline, oxymetazoline, fenoldopam, and ropinirole) are approved medications. Twenty-three of the hits were then "functionally profiled" (i.e., assayed in parallel for 5-HT(2B) receptor agonism using multiple readouts to test for functional selectivity). In these assays, the known valvulopathogens were efficacious at concentrations as low as 30 nM, whereas the other compounds were less so. Hierarchical clustering analysis of the pEC(50) data revealed that ropinirole (which is not associated with valvulopathy) was clearly segregated from known valvulopathogens. Taken together, our data demonstrate that patterns of 5-HT(2B) receptor functional selectivity might be useful for identifying compounds likely to induce valvular heart disease. PMID:19570945

  12. Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

    NASA Astrophysics Data System (ADS)

    Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

    2010-12-01

    Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer

  13. Kinematic Modeling and Function Generation for Non-linear Curves Using 5R Double Arm Parallel Manipulator

    NASA Astrophysics Data System (ADS)

    Keshavkumar Kamaliya, Parth; Patel, Yashavant Kumar Dashrathlal

    2016-01-01

    Double arm configuration using parallel manipulator mimic the human arm motions either for planar or spatial space. These configurations are currently lucrative for researchers as it also replaces human workers without major redesign of work-place in industries. Humans' joint ranges limitation of arms can be resolved by replacement of either revolute or spherical joints in manipulator. Hence, the scope of maximum workspace utilization is prevailed. Planar configuration with five revolute joints (5R) is considered to imitate human arm motions in a plane using Double Arm Manipulator (DAM). Position analysis for tool that can be held in end links of configuration is carried out using Pro/mechanism in Creo® as well as SimMechanics. D-H parameters are formulated and its results derived using developed MATLAB programs are compared with mechanism simulation as well as SimMechanics results. Inverse kinematics model is developed for trajectory planning in order to trace tool trajectory in a continuous and smooth sequence. Polynomial functions are derived for position, velocity and acceleration for linear and non-linear curves in joint space. Analytical results obtained for trajectory planning are validated with simulation results of Creo®.

  14. Parallelization and improvements of the generalized born model with a simple sWitching function for modern graphics processors.

    PubMed

    Arthur, Evan J; Brooks, Charles L

    2016-04-15

    Two fundamental challenges of simulating biologically relevant systems are the rapid calculation of the energy of solvation and the trajectory length of a given simulation. The Generalized Born model with a Simple sWitching function (GBSW) addresses these issues by using an efficient approximation of Poisson-Boltzmann (PB) theory to calculate each solute atom's free energy of solvation, the gradient of this potential, and the subsequent forces of solvation without the need for explicit solvent molecules. This study presents a parallel refactoring of the original GBSW algorithm and its implementation on newly available, low cost graphics chips with thousands of processing cores. Depending on the system size and nonbonded force cutoffs, the new GBSW algorithm offers speed increases of between one and two orders of magnitude over previous implementations while maintaining similar levels of accuracy. We find that much of the algorithm scales linearly with an increase of system size, which makes this water model cost effective for solvating large systems. Additionally, we utilize our GPU-accelerated GBSW model to fold the model system chignolin, and in doing so we demonstrate that these speed enhancements now make accessible folding studies of peptides and potentially small proteins. © 2016 Wiley Periodicals, Inc. PMID:26786647

  15. Functional Traits in Parallel Evolutionary Radiations and Trait-Environment Associations in the Cape Floristic Region of South Africa.

    PubMed

    Mitchell, Nora; Moore, Timothy E; Mollmann, Hayley Kilroy; Carlson, Jane E; Mocko, Kerri; Martinez-Cabrera, Hugo; Adams, Christopher; Silander, John A; Jones, Cynthia S; Schlichting, Carl D; Holsinger, Kent E

    2015-04-01

    Evolutionary radiations with extreme levels of diversity present a unique opportunity to study the role of the environment in plant evolution. If environmental adaptation played an important role in such radiations, we expect to find associations between functional traits and key climatic variables. Similar trait-environment associations across clades may reflect common responses, while contradictory associations may suggest lineage-specific adaptations. Here, we explore trait-environment relationships in two evolutionary radiations in the fynbos biome of the highly biodiverse Cape Floristic Region (CFR) of South Africa. Protea and Pelargonium are morphologically and evolutionarily diverse genera that typify the CFR yet are substantially different in growth form and morphology. Our analytical approach employs a Bayesian multiple-response generalized linear mixed-effects model, taking into account covariation among traits and controlling for phylogenetic relationships. Of the pairwise trait-environment associations tested, 6 out of 24 were in the same direction and 2 out of 24 were in opposite directions, with the latter apparently reflecting alternative life-history strategies. These findings demonstrate that trait diversity within two plant lineages may reflect both parallel and idiosyncratic responses to the environment, rather than all taxa conforming to a global-scale pattern. Such insights are essential for understanding how trait-environment associations arise and how they influence species diversification. PMID:25811086

  16. A preliminary numerical evaluation of a parallel algorithm for approximating the values and subgradients of the recourse function in a stochastic program with complete recourse

    SciTech Connect

    Lessor, K.S.

    1988-08-26

    The parallel algorithm of Ariyawansa, Sorensen, and Wets for approximating the values and subgradients of the recourse function in a stochastic program with complete recourse is implemented and timing results are reported for limited experimental trials. 14 refs., 6 figs., 8 tabs.

  17. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  18. Eclipse Parallel Tools Platform

    Energy Science and Technology Software Center (ESTSC)

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures

  19. Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

    PubMed

    Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

    2016-08-01

    The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. PMID:27317328

  20. Dynamic multi-swarm particle swarm optimizer using parallel PC cluster systems for global optimization of large-scale multimodal functions

    NASA Astrophysics Data System (ADS)

    Fan, Shu-Kai S.; Chang, Ju-Ming

    2010-05-01

    This article presents a novel parallel multi-swarm optimization (PMSO) algorithm with the aim of enhancing the search ability of standard single-swarm PSOs for global optimization of very large-scale multimodal functions. Different from the existing multi-swarm structures, the multiple swarms work in parallel, and the search space is partitioned evenly and dynamically assigned in a weighted manner via the roulette wheel selection (RWS) mechanism. This parallel, distributed framework of the PMSO algorithm is developed based on a master-slave paradigm, which is implemented on a cluster of PCs using message passing interface (MPI) for information interchange among swarms. The PMSO algorithm handles multiple swarms simultaneously and each swarm performs PSO operations of its own independently. In particular, one swarm is designated for global search and the others are for local search. The first part of the experimental comparison is made among the PMSO, standard PSO, and two state-of-the-art algorithms (CTSS and CLPSO) in terms of various un-rotated and rotated benchmark functions taken from the literature. In the second part, the proposed multi-swarm algorithm is tested on large-scale multimodal benchmark functions up to 300 dimensions. The results of the PMSO algorithm show great promise in solving high-dimensional problems.

  1. Massively parallel visualization: Parallel rendering

    SciTech Connect

    Hansen, C.D.; Krogh, M.; White, W.

    1995-12-01

    This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.

  2. Random-iteration algorithm-based optical parallel architecture for fractal-image decoding by use of iterated-function system codes.

    PubMed

    Chang, H T; Kuo, C J

    1998-03-10

    An optical parallel architecture for the random-iteration algorithm to decode a fractal image by use of iterated-function system (IFS) codes is proposed. The code value is first converted into transmittance in film or a spatial light modulator in the optical part of the system. With an optical-to-electrical converter, electrical-to-optical converter, and some electronic circuits for addition and delay, we can perform the contractive affine transformation (CAT) denoted in IFS codes. In the proposed decoding architecture all CAT's generate points (image pixels) in parallel, and these points then are joined for display purposes. Therefore the decoding speed is improved greatly compared with existing serial-decoding architectures. In addition, an error and stability analysis that considers nonperfect elements is presented for the proposed optical system. Finally, simulation results are given to validate the proposed architecture. PMID:18268718

  3. Parallel machines: Parallel machine languages

    SciTech Connect

    Iannucci, R.A. )

    1990-01-01

    This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.

  4. Parallel pipelining

    SciTech Connect

    Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.

    1995-09-01

    In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.

  5. Accelerating the performance of a novel meshless method based on collocation with radial basis functions by employing a graphical processing unit as a parallel coprocessor

    NASA Astrophysics Data System (ADS)

    Owusu-Banson, Derek

    In recent times, a variety of industries, applications and numerical methods including the meshless method have enjoyed a great deal of success by utilizing the graphical processing unit (GPU) as a parallel coprocessor. These benefits often include performance improvement over the previous implementations. Furthermore, applications running on graphics processors enjoy superior performance per dollar and performance per watt than implementations built exclusively on traditional central processing technologies. The GPU was originally designed for graphics acceleration but the modern GPU, known as the General Purpose Graphical Processing Unit (GPGPU) can be used for scientific and engineering calculations. The GPGPU consists of massively parallel array of integer and floating point processors. There are typically hundreds of processors per graphics card with dedicated high-speed memory. This work describes an application written by the author, titled GaussianRBF to show the implementation and results of a novel meshless method that in-cooperates the collocation of the Gaussian radial basis function by utilizing the GPU as a parallel co-processor. Key phases of the proposed meshless method have been executed on the GPU using the NVIDIA CUDA software development kit. Especially, the matrix fill and solution phases have been carried out on the GPU, along with some post processing. This approach resulted in a decreased processing time compared to similar algorithm implemented on the CPU while maintaining the same accuracy.

  6. rasterEngine: an easy-to-use R function for applying complex geostatistical models to raster datasets in a parallel computing environment

    NASA Astrophysics Data System (ADS)

    Greenberg, J. A.

    2013-12-01

    As geospatial analyses progress in tandem with increasing availability of large complex geographic data sets and high performance computing (HPC), there is an increasing gap in the ability of end-user tools to take advantage of these advances. Specifically, the practical implementation of complex statistical models on large gridded geographic datasets (e.g. remote sensing analysis, species distribution mapping, topographic transformations, and local neighborhood analyses) currently requires a significant knowledge base. A user must be proficient in the chosen model as well as the nuances of scientific programming, raster data models, memory management, parallel computing, and system design. This is further complicated by the fact that many of the cutting-edge analytical tools were developed for non-geospatial datasets and are not part of standard GIS packages, but are available in scientific computing languages such as R and MATLAB. We present a computing function 'rasterEngine' written in the R scientific computing language and part of the CRAN package 'spatial.tools' with these challenges in mind. The goal of rasterEngine is to allow a user to quickly develop and apply analytical models within the R computing environment to arbitrarily large gridded datasets, taking advantage of available parallel computing resources, and without requiring a deep understanding of HPC and raster data models. We provide several examples of rasterEngine being used to solve common grid based analyses, including remote sensing image analyses, topographic transformations, and species distribution modeling. With each example, the parallel processing performance results are presented.

  7. Data parallelism

    SciTech Connect

    Gorda, B.C.

    1992-09-01

    Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer's task easier.

  8. Data parallelism

    SciTech Connect

    Gorda, B.C.

    1992-09-01

    Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer`s task easier.

  9. Parallel Total Energy

    Energy Science and Technology Software Center (ESTSC)

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  10. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells

    PubMed Central

    Liu, Xiao-Ping; Koehler, Karl R.; Mikosz, Andrew M.; Hashino, Eri; Holt, Jeffrey R.

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair. PMID:27215798

  11. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells.

    PubMed

    Liu, Xiao-Ping; Koehler, Karl R; Mikosz, Andrew M; Hashino, Eri; Holt, Jeffrey R

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair. PMID:27215798

  12. A study of parallelizing O(N) Green-function-based Monte Carlo method for many fermions coupled with classical degrees of freedom

    NASA Astrophysics Data System (ADS)

    Zhang, Shixun; Yamagia, Shinichi; Yunoki, Seiji

    2013-08-01

    Models of fermions interacting with classical degrees of freedom are applied to a large variety of systems in condensed matter physics. For this class of models, Weiße [Phys. Rev. Lett. 102, 150604 (2009)] has recently proposed a very efficient numerical method, called O(N) Green-Function-Based Monte Carlo (GFMC) method, where a kernel polynomial expansion technique is used to avoid the full numerical diagonalization of the fermion Hamiltonian matrix of size N, which usually costs O(N3) computational complexity. Motivated by this background, in this paper we apply the GFMC method to the double exchange model in three spatial dimensions. We mainly focus on the implementation of GFMC method using both MPI on a CPU-based cluster and Nvidia's Compute Unified Device Architecture (CUDA) programming techniques on a GPU-based (Graphics Processing Unit based) cluster. The time complexity of the algorithm and the parallel implementation details on the clusters are discussed. We also show the performance scaling for increasing Hamiltonian matrix size and increasing number of nodes, respectively. The performance evaluation indicates that for a 323 Hamiltonian a single GPU shows higher performance equivalent to more than 30 CPU cores parallelized using MPI.

  13. A parallel implementation of the analytic nuclear gradient for time-dependent density functional theory within the Tamm-Dancoff approximation

    NASA Astrophysics Data System (ADS)

    Liu, Fenglai; Gan, Zhengting; Shao, Yihan; Hsu, Chao-Ping; Dreuw, Andreas; Head-Gordon, Martin; Miller, Benjamin T.; Brooks, Bernard R.; Yu, Jian-Guo; Furlani, Thomas R.; Kong, Jing

    2010-10-01

    We derived the analytic gradient for the excitation energies from a time-dependent density functional theory calculation within the Tamm-Dancoff approximation (TDDFT/TDA) using Gaussian atomic orbital basis sets, and introduced an efficient serial and parallel implementation. Some timing results are shown from a B3LYP/6-31G**/SG-1-grid calculation on zincporphyrin. We also performed TDDFT/TDA geometry optimizations for low-lying excited states of 20 small molecules, and compared adiabatic excitation energies and optimized geometry parameters to experimental values using the B3LYP and ωB97 functionals. There are only minor differences between TDDFT and TDA optimized excited state geometries and adiabatic excitation energies. Optimized bond lengths are in better agreement with experiment for both functionals than either CC2 or SOS-CIS(D0), while adiabatic excitation energies are in similar or slightly poorer agreement. Optimized bond angles with both functionals are more accurate than CIS values, but less accurate than either CC2 or SOS-CIS(D0) ones.

  14. Drawing a high-resolution functional map of adeno-associated virus capsid by massively parallel sequencing

    PubMed Central

    Adachi, Kei; Enoki, Tatsuji; Kawano, Yasuhiro; Veraz, Michael; Nakai, Hiroyuki

    2014-01-01

    Adeno-associated virus (AAV) capsid engineering is an emerging approach to advance gene therapy. However, a systematic analysis on how each capsid amino acid contributes to multiple functions remains challenging. Here we show proof-of-principle and successful application of a novel approach, termed AAV Barcode-Seq, that allows us to characterize phenotypes of hundreds of different AAV strains in a high-throughput manner and therefore overcomes technical difficulties in the systematic analysis. In this approach, we generate DNA barcode-tagged AAV libraries and determine a spectrum of phenotypes of each AAV strain by Illumina barcode sequencing. By applying this method to AAV capsid mutant libraries tagged with DNA barcodes, we can draw a high-resolution map of AAV capsid amino acids important for the structural integrity and functions including receptor binding, tropism, neutralization and blood clearance. Thus, Barcode-Seq provides a new tool to generate a valuable resource for virus and gene therapy research. PMID:24435020

  15. Resonance line transfer calculations by doubling thin layers. I - Comparison with other techniques. II - The use of the R-parallel redistribution function. [planetary atmospheres

    NASA Technical Reports Server (NTRS)

    Yelle, Roger V.; Wallace, Lloyd

    1989-01-01

    A versatile and efficient technique for the solution of the resonance line scattering problem with frequency redistribution in planetary atmospheres is introduced. Similar to the doubling approach commonly used in monochromatic scattering problems, the technique has been extended to include the frequency dependence of the radiation field. Methods for solving problems with external or internal sources and coupled spectral lines are presented, along with comparison of some sample calculations with results from Monte Carlo and Feautrier techniques. The doubling technique has also been applied to the solution of resonance line scattering problems where the R-parallel redistribution function is appropriate, both neglecting and including polarization as developed by Yelle and Wallace (1989). With the constraint that the atmosphere is illuminated from the zenith, the only difficulty of consequence is that of performing precise frequency integrations over the line profiles. With that problem solved, it is no longer necessary to use the Monte Carlo method to solve this class of problem.

  16. Parallel Information Processing.

    ERIC Educational Resources Information Center

    Rasmussen, Edie M.

    1992-01-01

    Examines parallel computer architecture and the use of parallel processors for text. Topics discussed include parallel algorithms; performance evaluation; parallel information processing; parallel access methods for text; parallel and distributed information retrieval systems; parallel hardware for text; and network models for information…

  17. NOCA-1 functions with γ-tubulin and in parallel to Patronin to assemble non-centrosomal microtubule arrays in C. elegans

    PubMed Central

    Wang, Shaohe; Wu, Di; Quintin, Sophie; Green, Rebecca A; Cheerambathur, Dhanya K; Ochoa, Stacy D; Desai, Arshad; Oegema, Karen

    2015-01-01

    Non-centrosomal microtubule arrays assemble in differentiated tissues to perform mechanical and transport-based functions. In this study, we identify Caenorhabditis elegans NOCA-1 as a protein with homology to vertebrate ninein. NOCA-1 contributes to the assembly of non-centrosomal microtubule arrays in multiple tissues. In the larval epidermis, NOCA-1 functions redundantly with the minus end protection factor Patronin/PTRN-1 to assemble a circumferential microtubule array essential for worm growth and morphogenesis. Controlled degradation of a γ-tubulin complex subunit in this tissue revealed that γ-tubulin acts with NOCA-1 in parallel to Patronin/PTRN-1. In the germline, NOCA-1 and γ-tubulin co-localize at the cell surface, and inhibiting either leads to a microtubule assembly defect. γ-tubulin targets independently of NOCA-1, but NOCA-1 targeting requires γ-tubulin when a non-essential putatively palmitoylated cysteine is mutated. These results show that NOCA-1 acts with γ-tubulin to assemble non-centrosomal arrays in multiple tissues and highlight functional overlap between the ninein and Patronin protein families. DOI: http://dx.doi.org/10.7554/eLife.08649.001 PMID:26371552

  18. Fracture problem for an external circumferential crack in a functionally graded superconducting cylinder subjected to a parallel magnetic field

    NASA Astrophysics Data System (ADS)

    Yan, Z.; Gao, S. W.; Feng, W. J.

    2016-02-01

    In this study, the multiple isoparametric finite element method (MIFEM) is used to investigate external circumferential crack problem of a functionally graded superconducting cylinder subjected to electromagnetic forces. The superconducting cylinder is composed by Bi2223/Ag composite with material parameters varying. A crack reference region is defined to reflect the effects of crack on flux and current densities, and the magnetically impermeable crack surface condition and the generalized Irie-Yamafuji critical state model outside the crack region are adopted. The distributions of magnetic flux density in the superconducting cylinder are obtained analytically for both the zero-field cooling (ZFC) and the field cooling (FC) activation processes. Based on the MIFEM, the stress intensity factors (SIFs) at crack fronts in the process of field ascent and/or descent are then numerically calculated. It is interesting to note from numerical results that for the present crack model in the ZFC activation process, the crack is easily propagate and grow with the applied field increases, and that in the field descent process of either the ZFC case or FC case, the crack generally does not propagate. In addition, in the field ascent process of the ZFC case, the SIFs depend on not only the crack depths and model parameters but also the applied field. The present study should be helpful to the design and application of high-temperature superconductors with external edge cracks.

  19. Parallel processor engine model program

    NASA Technical Reports Server (NTRS)

    Mclaughlin, P.

    1984-01-01

    The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

  20. Parallel assessment of male reproductive function in workers and wild rats exposed to pesticides in banana plantations in Guadeloupe

    PubMed Central

    Multigner, Luc; Kadhel, Philippe; Pascal, Michel; Huc-Terki, Farida; Kercret, Henri; Massart, Catherine; Janky, Eustase; Auger, Jacques; Jégou, Bernard

    2008-01-01

    Background There is increasing evidence that reproductive abnormalities are increasing in frequency in both human population and among wild fauna. This increase is probably related to exposure to toxic contaminants in the environment. The use of sentinel species to raise alarms relating to human reproductive health has been strongly recommended. However, no simultaneous studies at the same site have been carried out in recent decades to evaluate the utility of wild animals for monitoring human reproductive disorders. We carried out a joint study in Guadeloupe assessing the reproductive function of workers exposed to pesticides in banana plantations and of male wild rats living in these plantations. Methods A cross-sectional study was performed to assess semen quality and reproductive hormones in banana workers and in men working in non-agricultural sectors. These reproductive parameters were also assessed in wild rats captured in the plantations and were compared with those in rats from areas not directly polluted by humans. Results No significant difference in sperm characteristics and/or hormones was found between workers exposed and not exposed to pesticide. By contrast, rats captured in the banana plantations had lower testosterone levels and gonadosomatic indices than control rats. Conclusion Wild rats seem to be more sensitive than humans to the effects of pesticide exposure on reproductive health. We conclude that the concept of sentinel species must be carefully validated as the actual nature of exposure may varies between human and wild species as well as the vulnerable time period of exposure and various ecological factors. PMID:18667078

  1. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    NASA Astrophysics Data System (ADS)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering

  2. Executive functioning as a mediator of conduct problems prevention in children of homeless families residing in temporary supportive housing: a parallel process latent growth modeling approach.

    PubMed

    Piehler, Timothy F; Bloomquist, Michael L; August, Gerald J; Gewirtz, Abigail H; Lee, Susanne S; Lee, Wendy S C

    2014-01-01

    A culturally diverse sample of formerly homeless youth (ages 6-12) and their families (n = 223) participated in a cluster randomized controlled trial of the Early Risers conduct problems prevention program in a supportive housing setting. Parents provided 4 annual behaviorally-based ratings of executive functioning (EF) and conduct problems, including at baseline, over 2 years of intervention programming, and at a 1-year follow-up assessment. Using intent-to-treat analyses, a multilevel latent growth model revealed that the intervention group demonstrated reduced growth in conduct problems over the 4 assessment points. In order to examine mediation, a multilevel parallel process latent growth model was used to simultaneously model growth in EF and growth in conduct problems along with intervention status as a covariate. A significant mediational process emerged, with participation in the intervention promoting growth in EF, which predicted negative growth in conduct problems. The model was consistent with changes in EF fully mediating intervention-related changes in youth conduct problems over the course of the study. These findings highlight the critical role that EF plays in behavioral change and lends further support to its importance as a target in preventive interventions with populations at risk for conduct problems. PMID:24141709

  3. Executive Functioning as a Mediator of Conduct Problems Prevention in Children of Homeless Families Residing in Temporary Supportive Housing: A Parallel Process Latent Growth Modeling Approach

    PubMed Central

    Piehler, Timothy F.; Bloomquist, Michael L.; August, Gerald J.; Gewirtz, Abigail H.; Lee, Susanne S.; Lee, Wendy S. C.

    2013-01-01

    A culturally diverse sample of formerly homeless youth (ages 6 – 12) and their families (n=223) participated in a cluster randomized controlled trial of the Early Risers conduct problems prevention program in a supportive housing setting. Parents provided 4 annual behaviorally-based ratings of executive functioning (EF) and conduct problems, including at baseline, over 2 years of intervention programming, and at a 1-year follow-up assessment. Using intent-to-treat analyses, a multilevel latent growth model revealed that the intervention group demonstrated reduced growth in conduct problems over the 4 assessment points. In order to examine mediation, a multilevel parallel process latent growth model was used to simultaneously model growth in EF and growth in conduct problems along with intervention status as a covariate. A significant mediational process emerged, with participation in the intervention promoting growth in EF, which predicted negative growth in conduct problems. The model was consistent with changes in EF fully mediating intervention-related changes in youth conduct problems over the course of the study. These findings highlight the critical role that EF plays in behavioral change and lends further support to its importance as a target in preventive interventions with populations at risk for conduct problems. PMID:24141709

  4. Parallel Programming in the Age of Ubiquitous Parallelism

    NASA Astrophysics Data System (ADS)

    Pingali, Keshav

    2014-04-01

    Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs

  5. Ultrascalable petaflop parallel supercomputer

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  6. Special parallel processing workshop

    SciTech Connect

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  7. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  8. Matpar: Parallel Extensions for MATLAB

    NASA Technical Reports Server (NTRS)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  9. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  10. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  11. Parallel Analysis of mRNA and microRNA Microarray Profiles to Explore Functional Regulatory Patterns in Polycystic Kidney Disease: Using PKD/Mhm Rat Model

    PubMed Central

    Dweep, Harsh; Sticht, Carsten; Kharkar, Asawari; Pandey, Priyanka; Gretz, Norbert

    2013-01-01

    Autosomal polycystic kidney disease (ADPKD) is a frequent monogenic renal disease, characterised by fluid-filled cysts that are thought to result from multiple deregulated pathways such as cell proliferation and apoptosis. MicroRNAs (miRNAs) are small non-coding RNAs that regulate the expression of many genes associated with such biological processes and human pathologies. To explore the possible regulatory role of miRNAs in PKD, the PKD/Mhm (cy/+) rat, served as a model to study human ADPKD. A parallel microarray-based approach was conducted to profile the expression changes of mRNAs and miRNAs in PKD/Mhm rats. 1,573 up- and 1,760 down-regulated genes were differentially expressed in PKD/Mhm. These genes are associated with 17 pathways (such as focal adhesion, cell cycle, ECM-receptor interaction, DNA replication and metabolic pathways) and 47 (e.g., cell proliferation, Wnt and Tgfβ signaling) Gene Ontologies. Furthermore, we found the similar expression patterns of deregulated genes between PKD/Mhm (cy/+) rat and human ADPKD, PKD1L3/L3, PKD1−/−, Hnf1α-deficient, and Glis2lacZ/lacZ models. Additionally, several differentially regulated genes were noted to be target hubs for miRNAs. We also obtained 8 significantly up-regulated miRNAs (rno-miR-199a-5p, −214, −146b, −21, −34a, −132, −31 and −503) in diseased kidneys of PKD/Mhm rats. Additionally, the binding site overrepresentation and pathway enrichment analyses were accomplished on the putative targets of these 8 miRNAs. 7 out of these 8 miRNAs and their possible interactions have not been previously described in ADPKD. We have shown a strong overlap of functional patterns (pathways) between deregulated miRNAs and mRNAs in the PKD/Mhm (cy/+) rat model. Our findings suggest that several miRNAs may be associated in regulating pathways in ADPKD. We further describe novel miRNAs and their possible targets in ADPKD, which will open new avenues to understand the pathogenesis of human ADPKD

  12. MPP parallel forth

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1987-01-01

    Massively Parallel Processor (MPP) Parallel FORTH is a derivative of FORTH-83 and Unified Software Systems' Uni-FORTH. The extension of FORTH into the realm of parallel processing on the MPP is described. With few exceptions, Parallel FORTH was made to follow the description of Uni-FORTH as closely as possible. Likewise, the parallel FORTH extensions were designed as philosophically similar to serial FORTH as possible. The MPP hardware characteristics, as viewed by the FORTH programmer, is discussed. Then a description is presented of how parallel FORTH is implemented on the MPP.

  13. Bounded Parallel-Batch Scheduling on Unrelated Parallel Machines

    NASA Astrophysics Data System (ADS)

    Miao, Cuixia; Zhang, Yuzhong; Wang, Chengfei

    In this paper, we consider the bounded parallel-batch scheduling problem on unrelated parallel machines. Problems R m |B|F are NP-hard for any objective function F. For this reason, we discuss the special case with p ij = p i for i = 1, 2, ⋯ , m , j = 1, 2, ⋯ , n. We give optimal algorithms for the general scheduling to minimize total weighted completion time, makespan and the number of tardy jobs. And we design pseudo-polynomial time algorithms for the case with rejection penalty to minimize the makespan and the total weighted completion time plus the total penalty of the rejected jobs, respectively.

  14. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  15. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  16. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  17. Parallel Atomistic Simulations

    SciTech Connect

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  18. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    SciTech Connect

    Azmy, Y.Y.; Barnett, D.A.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  19. Multitasking TORT under UNICOS: Parallel performance models and measurements

    SciTech Connect

    Barnett, A.; Azmy, Y.Y.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  20. Linearly exact parallel closures for slab geometry

    SciTech Connect

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-15

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients)

  1. Linearly exact parallel closures for slab geometry

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-01

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).

  2. Improving the spatial accuracy in functional magnetic resonance imaging (fMRI) based on the blood oxygenation level dependent (BOLD) effect: benefits from parallel imaging and a 32-channel head array coil at 1.5 Tesla.

    PubMed

    Fellner, C; Doenitz, C; Finkenzeller, T; Jung, E M; Rennert, J; Schlaier, J

    2009-01-01

    Geometric distortions and low spatial resolution are current limitations in functional magnetic resonance imaging (fMRI). The aim of this study was to evaluate if application of parallel imaging or significant reduction of voxel size in combination with a new 32-channel head array coil can reduce those drawbacks at 1.5 T for a simple hand motor task. Therefore, maximum t-values (tmax) in different regions of activation, time-dependent signal-to-noise ratios (SNR(t)) as well as distortions within the precentral gyrus were evaluated. Comparing fMRI with and without parallel imaging in 17 healthy subjects revealed significantly reduced geometric distortions in anterior-posterior direction. Using parallel imaging, tmax only showed a mild reduction (7-11%) although SNR(t) was significantly diminished (25%). In 7 healthy subjects high-resolution (2 x 2 x 2 mm3) fMRI was compared with standard fMRI (3 x 3 x 3 mm3) in a 32-channel coil and with high-resolution fMRI in a 12-channel coil. The new coil yielded a clear improvement for tmax (21-32%) and SNR(t) (51%) in comparison with the 12-channel coil. Geometric distortions were smaller due to the smaller voxel size. Therefore, the reduction in tmax (8-16%) and SNR(t) (52%) in the high-resolution experiment seems to be tolerable with this coil. In conclusion, parallel imaging is an alternative to reduce geometric distortions in fMRI at 1.5 T. Using a 32-channel coil, reduction of the voxel size might be the preferable way to improve spatial accuracy. PMID:19713602

  3. Parallel digital forensics infrastructure.

    SciTech Connect

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  4. Parallel MR Imaging

    PubMed Central

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A.; Seiberlich, Nicole

    2015-01-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the under-sampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. PMID:22696125

  5. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  6. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  7. Social Problems and Deviance: Some Parallel Issues

    ERIC Educational Resources Information Center

    Kitsuse, John I.; Spector, Malcolm

    1975-01-01

    Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)

  8. Parallel in vivo and in vitro detection of functional somatostatin receptors in human endocrine pancreatic tumors: Consequences with regard to diagnosis, localization, and therapy

    SciTech Connect

    Lamberts, S.W.; Hofland, L.J.; van Koetsveld, P.M.; Reubi, J.C.; Bruining, H.A.; Bakker, W.H.; Krenning, E.P. )

    1990-09-01

    The effects of octreotide in vivo and in vitro on hormone release, in vivo ({sup 123}I)Tyr3-octreotide scanning, and in vitro ({sup 125}I)Tyr3-octreotide autoradiography were compared in five patients with endocrine pancreatic tumors. ({sup 123}I)Tyr3-octreotide scanning localized the primary tumor and/or previously unknown metastases in four of the five patients. The patient with a negative scan had an insulinoma that did not respond to octreotide in vivo. No Tyr3-octreotide-binding sites were subsequently found at autoradiography of the tumor, whereas somatostatin-14 receptors were present at a high density. In parallel, culture studies with the cells prepared from this adenoma showed that insulin release was not affected by octreotide, while both somatostatin-14 and -28 significantly suppressed hormone release. Culture studies of the tumor cells from two gastrinomas showed a dose-dependent inhibition of gastrin release by octreotide. Octreotide exerted direct antiproliferative effects in one of these gastrinomas, which had been shown to be rapidly growing in vivo. Both gastrinomas had specific somatostatin receptors, as measured by in vitro receptor autoradiography. Somatostatin release by the cultured somatostatinoma cells from one of these patients was suppressed by octreotide.

  9. Parallel scheduling algorithms

    SciTech Connect

    Dekel, E.; Sahni, S.

    1983-01-01

    Parallel algorithms are given for scheduling problems such as scheduling to minimize the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish time. The shared memory model of parallel computers is used to obtain fast algorithms. 26 references.

  10. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  11. Parallel computing works

    SciTech Connect

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  12. Parallel nearest neighbor calculations

    NASA Astrophysics Data System (ADS)

    Trease, Harold

    We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.

  13. Bilingual parallel programming

    SciTech Connect

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.

  14. Expression of mitochondrial regulatory genes parallels respiratory capacity and contractile function in a rat model of hypoxia-induced right ventricular hypertrophy

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chronic hypobaric hypoxia (CHH) increases load on the right ventricle (RV) resulting in RV hypertrophy. We hypothesized that CHH elicits distinct responses, i.e., the hypertrophied RV, unlike the left ventricle (LV), displaying enhanced mitochondrial respiratory and contractile function. Wistar rats...

  15. The function of the glutamate–nitric oxide–cGMP pathway in brain in vivo and learning ability decrease in parallel in mature compared with young rats

    PubMed Central

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-01-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate–nitric oxide–cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to learn this task decreases with age and whether this reduction is associated with a decreased function of the glutamate–nitric oxide–cGMP pathway in brain in vivo, as analyzed by microdialysis in freely moving rats. We show that 7-mo-old rats need significantly more (192 ± 64%) trials than do 3-mo-old rats to learn the Y-maze task. Moreover, the function of the glutamate–nitric oxide–cGMP pathway is reduced by 60 ± 23% in 7-mo-old rats compared with 3-mo-old rats. The results reported support the idea that the reduction in the ability to learn the Y-maze task (and likely other types of learning) of mature compared with young rats would be a consequence of reduced function of the glutamate–nitric oxide–cGMP pathway. PMID:17412964

  16. The function of the glutamate-nitric oxide-cGMP pathway in brain in vivo and learning ability decrease in parallel in mature compared with young rats.

    PubMed

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-04-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate-nitric oxide-cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to learn this task decreases with age and whether this reduction is associated with a decreased function of the glutamate-nitric oxide-cGMP pathway in brain in vivo, as analyzed by microdialysis in freely moving rats. We show that 7-mo-old rats need significantly more (192 +/- 64%) trials than do 3-mo-old rats to learn the Y-maze task. Moreover, the function of the glutamate-nitric oxide-cGMP pathway is reduced by 60 +/- 23% in 7-mo-old rats compared with 3-mo-old rats. The results reported support the idea that the reduction in the ability to learn the Y-maze task (and likely other types of learning) of mature compared with young rats would be a consequence of reduced function of the glutamate-nitric oxide-cGMP pathway. PMID:17412964

  17. The Function of the Glutamate-Nitric Oxide-cGMP Pathway in Brain in Vivo and Learning Ability Decrease in Parallel in Mature Compared with Young Rats

    ERIC Educational Resources Information Center

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-01-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate-nitric oxide-cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to…

  18. Processing Semblances Induced through Inter-Postsynaptic Functional LINKs, Presumed Biological Parallels of K-Lines Proposed for Building Artificial Intelligence

    PubMed Central

    Vadakkan, Kunjumon I.

    2011-01-01

    The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation – namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky’s K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system. PMID:21845180

  19. Mirror versus parallel bimanual reaching

    PubMed Central

    2013-01-01

    Background In spite of their importance to everyday function, tasks that require both hands to work together such as lifting and carrying large objects have not been well studied and the full potential of how new technology might facilitate recovery remains unknown. Methods To help identify the best modes for self-teleoperated bimanual training, we used an advanced haptic/graphic environment to compare several modes of practice. In a 2-by-2 study, we compared mirror vs. parallel reaching movements, and also compared veridical display to one that transforms the right hand’s cursor to the opposite side, reducing the area that the visual system has to monitor. Twenty healthy, right-handed subjects (5 in each group) practiced 200 movements. We hypothesized that parallel reaching movements would be the best performing, and attending to one visual area would reduce the task difficulty. Results The two-way comparison revealed that mirror movement times took an average 1.24 s longer to complete than parallel. Surprisingly, subjects’ movement times moving to one target (attending to one visual area) also took an average of 1.66 s longer than subjects moving to two targets. For both hands, there was also a significant interaction effect, revealing the lowest errors for parallel movements moving to two targets (p < 0.001). This was the only group that began and maintained low errors throughout training. Conclusion Combined with other evidence, these results suggest that the most intuitive reaching performance can be observed with parallel movements with a veridical display (moving to two separate targets). These results point to the expected levels of challenge for these bimanual training modes, which could be used to advise therapy choices in self-neurorehabilitation. PMID:23837908

  20. Parallel system simulation

    SciTech Connect

    Tai, H.M.; Saeks, R.

    1984-03-01

    A relaxation algorithm for solving large-scale system simulation problems in parallel is proposed. The algorithm, which is composed of both a time-step parallel algorithm and a component-wise parallel algorithm, is described. The interconnected nature of the system, which is characterized by the component connection model, is fully exploited by this approach. A technique for finding an optimal number of the time steps is also described. Finally, this algorithm is illustrated via several examples in which the possible trade-offs between the speed-up ratio, efficiency, and waiting time are analyzed.

  1. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  2. NWChem: scalable parallel computational chemistry

    SciTech Connect

    van Dam, Hubertus JJ; De Jong, Wibe A.; Bylaska, Eric J.; Govind, Niranjan; Kowalski, Karol; Straatsma, TP; Valiev, Marat

    2011-11-01

    NWChem is a general purpose computational chemistry code specifically designed to run on distributed memory parallel computers. The core functionality of the code focuses on molecular dynamics, Hartree-Fock and density functional theory methods for both plane-wave basis sets as well as Gaussian basis sets, tensor contraction engine based coupled cluster capabilities and combined quantum mechanics/molecular mechanics descriptions. It was realized from the beginning that scalable implementations of these methods required a programming paradigm inherently different from what message passing approaches could offer. In response a global address space library, the Global Array Toolkit, was developed. The programming model it offers is based on using predominantly one-sided communication. This model underpins most of the functionality in NWChem and the power of it is exemplified by the fact that the code scales to tens of thousands of processors. In this paper the core capabilities of NWChem are described as well as their implementation to achieve an efficient computational chemistry code with high parallel scalability. NWChem is a modern, open source, computational chemistry code1 specifically designed for large scale parallel applications2. To meet the challenges of developing efficient, scalable and portable programs of this nature a particular code design was adopted. This code design involved two main features. First of all, the code is build up in a modular fashion so that a large variety of functionality can be integrated easily. Secondly, to facilitate writing complex parallel algorithms the Global Array toolkit was developed. This toolkit allows one to write parallel applications in a shared memory like approach, but offers additional mechanisms to exploit data locality to lower communication overheads. This framework has proven to be very successful in computational chemistry but is applicable to any engineering domain. Within the context created by the features

  3. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  4. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  5. Parallels with nature

    NASA Astrophysics Data System (ADS)

    2014-10-01

    Adam Nelson and Stuart Warriner, from the University of Leeds, talk with Nature Chemistry about their work to develop viable synthetic strategies for preparing new chemical structures in parallel with the identification of desirable biological activity.

  6. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  7. Simplified Parallel Domain Traversal

    SciTech Connect

    Erickson III, David J

    2011-01-01

    Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep by performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.

  8. Partitioning and parallel radiosity

    NASA Astrophysics Data System (ADS)

    Merzouk, S.; Winkler, C.; Paul, J. C.

    1996-03-01

    This paper proposes a theoretical framework, based on domain subdivision for parallel radiosity. Moreover, three various implementation approaches, taking advantage of partitioning algorithms and global shared memory architecture, are presented.

  9. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  10. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  11. Continuous parallel coordinates.

    PubMed

    Heinrich, Julian; Weiskopf, Daniel

    2009-01-01

    Typical scientific data is represented on a grid with appropriate interpolation or approximation schemes,defined on a continuous domain. The visualization of such data in parallel coordinates may reveal patterns latently contained in the data and thus can improve the understanding of multidimensional relations. In this paper, we adopt the concept of continuous scatterplots for the visualization of spatially continuous input data to derive a density model for parallel coordinates. Based on the point-line duality between scatterplots and parallel coordinates, we propose a mathematical model that maps density from a continuous scatterplot to parallel coordinates and present different algorithms for both numerical and analytical computation of the resulting density field. In addition, we show how the 2-D model can be used to successively construct continuous parallel coordinates with an arbitrary number of dimensions. Since continuous parallel coordinates interpolate data values within grid cells, a scalable and dense visualization is achieved, which will be demonstrated for typical multi-variate scientific data. PMID:19834230

  12. Parallel re-modeling of EF-1α function: divergent EF-1α genes co-occur with EFL genes in diverse distantly related eukaryotes

    PubMed Central

    2013-01-01

    Background Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a ‘differential loss’ hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species). Results In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes. Conclusions According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes. PMID:23800323

  13. Finite element computation with parallel VLSI

    NASA Technical Reports Server (NTRS)

    Mcgregor, J.; Salama, M.

    1983-01-01

    This paper describes a parallel processing computer consisting of a 16-bit microcomputer as a master processor which controls and coordinates the activities of 8086/8087 VLSI chip set slave processors working in parallel. The hardware is inexpensive and can be flexibly configured and programmed to perform various functions. This makes it a useful research tool for the development of, and experimentation with parallel mathematical algorithms. Application of the hardware to computational tasks involved in the finite element analysis method is demonstrated by the generation and assembly of beam finite element stiffness matrices. A number of possible schemes for the implementation of N-elements on N- or n-processors (N is greater than n) are described, and the speedup factors of their time consumption are determined as a function of the number of available parallel processors.

  14. Two-axis acceleration of functional connectivity magnetic resonance imaging by parallel excitation of phase-tagged slices and half k-space acceleration.

    PubMed

    Jesmanowicz, Andrzej; Nencka, Andrew S; Li, Shi-Jiang; Hyde, James S

    2011-01-01

    Whole brain functional connectivity magnetic resonance imaging requires acquisition of a time course of gradient-recalled (GR) volumetric images. A method is developed to accelerate this acquisition using GR echo-planar imaging and radio frequency (RF) slice phase tagging. For N-fold acceleration, a tailored RF pulse excites N slices using a uniform-field transmit coil. This pulse is the Fourier transform of the profile for the N slices with a predetermined RF phase tag on each slice. A multichannel RF receive coil is used for detection. For n slices, there are n/N groups of slices. Signal-averaged reference images are created for each slice within each slice group for each member of the coil array and used to separate overlapping images that are simultaneously received. The time-overhead for collection of reference images is small relative to the acquisition time of a complete volumetric time course. A least-squares singular value decomposition method allows image separation on a pixel-by-pixel basis. Twofold slice acceleration is demonstrated using an eight-channel RF receive coil, with application to resting-state functional magnetic resonance imaging in the human brain. Data from six subjects at 3 T are reported. The method has been extended to half k-space acquisition, which not only provides additional acceleration, but also facilitates slice separation because of increased signal intensity of the central lines of k-space coupled with reduced susceptibility effects. PMID:22432957

  15. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect

    Gray, Genetha A.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  16. Parallel time integration software

    Energy Science and Technology Software Center (ESTSC)

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore » come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less

  17. Parallel time integration software

    SciTech Connect

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.

  18. Fus3p and Kss1p control G1 arrest in Saccharomyces cerevisiae through a balance of distinct arrest and proliferative functions that operate in parallel with Far1p.

    PubMed Central

    Cherkasova, V; Lyons, D M; Elion, E A

    1999-01-01

    In Saccharomyces cerevisiae, mating pheromones activate two MAP kinases (MAPKs), Fus3p and Kss1p, to induce G1 arrest prior to mating. Fus3p is known to promote G1 arrest by activating Far1p, which inhibits three Clnp/Cdc28p kinases. To analyze the contribution of Fus3p and Kss1p to G1 arrest that is independent of Far1p, we constructed far1 CLN strains that undergo G1 arrest from increased activation of the mating MAP kinase pathway. We find that Fus3p and Kss1p both control G1 arrest through multiple functions that operate in parallel with Far1p. Fus3p and Kss1p together promote G1 arrest by repressing transcription of G1/S cyclin genes (CLN1, CLN2, CLB5) by a mechanism that blocks their activation by Cln3p/Cdc28p kinase. In addition, Fus3p and Kss1p counteract G1 arrest through overlapping and distinct functions. Fus3p and Kss1p together increase the expression of CLN3 and PCL2 genes that promote budding, and Kss1p inhibits the MAP kinase cascade. Strikingly, Fus3p promotes proliferation by a novel function that is not linked to reduced Ste12p activity or increased levels of Cln2p/Cdc28p kinase. Genetic analysis suggests that Fus3p promotes proliferation through activation of Mcm1p transcription factor that upregulates numerous genes in G1 phase. Thus, Fus3p and Kss1p control G1 arrest through a balance of arrest functions that inhibit the Cdc28p machinery and proliferative functions that bypass this inhibition. PMID:10049917

  19. Parallel optical sampler

    SciTech Connect

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  20. The NAS Parallel Benchmarks

    SciTech Connect

    Bailey, David H.

    2009-11-15

    The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage

  1. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  2. Speeding up parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.

  3. Programming parallel vision algorithms

    SciTech Connect

    Shapiro, L.G.

    1988-01-01

    Computer vision requires the processing of large volumes of data and requires parallel architectures and algorithms to be useful in real-time, industrial applications. The INSIGHT dataflow language was designed to allow encoding of vision algorithms at all levels of the computer vision paradigm. INSIGHT programs, which are relational in nature, can be translated into a graph structure that represents an architecture for solving a particular vision problem or a configuration of a reconfigurable computational network. The authors consider here INSIGHT programs that produce a parallel net architecture for solving low-, mid-, and high-level vision tasks.

  4. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.

  5. Coarrars for Parallel Processing

    NASA Technical Reports Server (NTRS)

    Snyder, W. Van

    2011-01-01

    The design of the Coarray feature of Fortran 2008 was guided by answering the question "What is the smallest change required to convert Fortran to a robust and efficient parallel language." Two fundamental issues that any parallel programming model must address are work distribution and data distribution. In order to coordinate work distribution and data distribution, methods for communication and synchronization must be provided. Although originally designed for Fortran, the Coarray paradigm has stimulated development in other languages. X10, Chapel, UPC, Titanium, and class libraries being developed for C++ have the same conceptual framework.

  6. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  7. Solution-phase parallel synthesis of a pharmacophore library of HUN-7293 analogues: a general chemical mutagenesis approach to defining structure-function properties of naturally occurring cyclic (depsi)peptides.

    PubMed

    Chen, Yan; Bilban, Melitta; Foster, Carolyn A; Boger, Dale L

    2002-05-15

    HUN-7293 (1), a naturally occurring cyclic heptadepsipeptide, is a potent inhibitor of cell adhesion molecule expression (VCAM-1, ICAM-1, E-selectin), the overexpression of which is characteristic of chronic inflammatory diseases. Representative of a general approach to defining structure-function relationships of such cyclic (depsi)peptides, the parallel synthesis and evaluation of a complete library of key HUN-7293 analogues are detailed enlisting solution-phase techniques and simple acid-base liquid-liquid extractions for isolation and purification of intermediates and final products. Significant to the design of the studies and unique to solution-phase techniques, the library was assembled superimposing a divergent synthetic strategy onto a convergent total synthesis. An alanine scan and N-methyl deletion of each residue of the cyclic heptadepsipeptide identified key sites responsible for or contributing to the biological properties. The simultaneous preparation of a complete set of individual residue analogues further simplifying the structure allowed an assessment of each structural feature of 1, providing a detailed account of the structure-function relationships in a single study. Within this pharmacophore library prepared by systematic chemical mutagenesis of the natural product structure, simplified analogues possessing comparable potency and, in some instances, improved selectivity were identified. One potent member of this library proved to be an additional natural product in its own right, which we have come to refer to as HUN-7293B (8), being isolated from the microbial strain F/94-499709. PMID:11996584

  8. NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Subhash, Saini; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

    1995-01-01

    The NAS Parallel Benchmarks (NPB) were developed in 1991 at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a pencil and paper fashion i.e. the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. In this paper, we present new NPB performance results for the following systems: (a) Parallel-Vector Processors: Cray C90, Cray T'90 and Fujitsu VPP500; (b) Highly Parallel Processors: Cray T3D, IBM SP2 and IBM SP-TN2 (Thin Nodes 2); (c) Symmetric Multiprocessing Processors: Convex Exemplar SPP1000, Cray J90, DEC Alpha Server 8400 5/300, and SGI Power Challenge XL. We also present sustained performance per dollar for Class B LU, SP and BT benchmarks. We also mention NAS future plans of NPB.

  9. High performance parallel architectures

    SciTech Connect

    Anderson, R.E. )

    1989-09-01

    In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.

  10. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  11. Parallel Multigrid Equation Solver

    Energy Science and Technology Software Center (ESTSC)

    2001-09-07

    Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.

  12. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (ESTSC)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  13. Optical parallel selectionist systems

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John

    1993-01-01

    There are at least two major classes of computers in nature and technology: connectionist and selectionist. A subset of connectionist systems (Turing Machines) dominates modern computing, although another subset (Neural Networks) is growing rapidly. Selectionist machines have unique capabilities which should allow them to do truly creative operations. It is possible to make a parallel optical selectionist system using methods describes in this paper.

  14. Parallel fast gauss transform

    SciTech Connect

    Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan

    2010-01-01

    We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.

  15. Parallel hierarchical global illumination

    SciTech Connect

    Snell, Q.O.

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  16. Parallel hierarchical radiosity rendering

    SciTech Connect

    Carter, M.

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  17. LEWICE droplet trajectory calculations on a parallel computer

    NASA Technical Reports Server (NTRS)

    Caruso, Steven C.

    1993-01-01

    A parallel computer implementation (128 processors) of LEWICE, a NASA Lewis code used to predict the time-dependent ice accretion process for two-dimensional aerodynamic bodies of simple geometries, is described. Two-dimensional parallel droplet trajectory calculations are performed to demonstrate the potential benefits of applying parallel processing to ice accretion analysis. Parallel performance is evaluated as a function of the number of trajectories and the number of processors. For comparison, similar trajectory calculations are performed on single-processor Cray computers, and the best parallel results are found to be 33 and 23 times faster, respectively, than those of the Cray XMP and YMP.

  18. PARALLEL ASSAY OF OXYGEN EQUILIBRIA OF HEMOGLOBIN

    PubMed Central

    Lilly, Laura E.; Blinebry, Sara K.; Viscardi, Chelsea M.; Perez, Luis; Bonaventura, Joe; McMahon, Tim J.

    2013-01-01

    Methods to systematically analyze in parallel the function of multiple protein or cell samples in vivo or ex vivo (i.e. functional proteomics) in a controlled gaseous environment have thus far been limited. Here we describe an apparatus and procedure that enables, for the first time, parallel assay of oxygen equilibria in multiple samples. Using this apparatus, numerous simultaneous oxygen equilibrium curves (OECs) can be obtained under truly identical conditions from blood cell samples or purified hemoglobins (Hbs). We suggest that the ability to obtain these parallel datasets under identical conditions can be of immense value, both to biomedical researchers and clinicians who wish to monitor blood health, and to physiologists studying non-human organisms and the effects of climate change on these organisms. Parallel monitoring techniques are essential in order to better understand the functions of critical cellular proteins. The procedure can be applied to human studies, wherein an OEC can be analyzed in light of an individual’s entire genome. Here, we analyzed intraerythrocytic Hb, a protein that operates at the organism’s environmental interface and then comes into close contact with virtually all of the organism’s cells. The apparatus is theoretically scalable, and establishes a functional proteomic screen that can be correlated with genomic information on the same individuals. This new method is expected to accelerate our general understanding of protein function, an increasingly challenging objective as advances in proteomic and genomic throughput outpace the ability to study proteins’ functional properties. PMID:23827235

  19. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  20. A highly parallel signal processor

    NASA Astrophysics Data System (ADS)

    Bigham, Jackson D., Jr.

    There is an increasing need for signal processors functional across a broad range of problems, from radar systems to E-O and ESM applications. To meet this challenge, a signal processing system capable of efficiently meeting the processing requirements over a broad range of avionics sensor systems has been developed. The CDC Parallel Modular Signal Processor (PMSP) is a complete MIL/E-5400-qualified digital signal processing system capable of computation rates greater than 600 MOPS (million operations per second). The signal processing element of the PMSP is the Micro-AFP. It is an all-VLSI processor capable of executing multiple simultaneous operations. Up to five Micro-AFPs and 12 MB of main store memory (MSM), along with associated control and I/O functions, are contained in the PMSP's standard ATR enclosure.

  1. Hybrid Optimization Parallel Search PACKage

    Energy Science and Technology Software Center (ESTSC)

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework providesmore » a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less

  2. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  3. Parallel Anisotropic Tetrahedral Adaptation

    NASA Technical Reports Server (NTRS)

    Park, Michael A.; Darmofal, David L.

    2008-01-01

    An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.

  4. Homology, convergence and parallelism.

    PubMed

    Ghiselin, Michael T

    2016-01-01

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721

  5. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  6. Seeing in parallel

    SciTech Connect

    Little, J.J.; Poggio, T.; Gamble, E.B. Jr.

    1988-01-01

    Computer algorithms have been developed for early vision processes that give separate cues to the distance from the viewer of three-dimensional surfaces, their shape, and their material properties. The MIT Vision Machine is a computer system that integrates several early vision modules to achieve high-performance recognition and navigation in unstructured environments. It is also an experimental environment for theoretical progress in early vision algorithms, their parallel implementation, and their integration. The Vision Machine consists of a movable, two-camera Eye-Head input device and an 8K Connection Machine. The authors have developed and implemented several parallel early vision algorithms that compute edge detection, stereopsis, motion, texture, and surface color in close to real time. The integration stage, based on coupled Markov random field models, leads to a cartoon-like map of the discontinuities in the scene, with partial labeling of the brightness edges in terms of their physical origin.

  7. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Gryphon, Coranth D.; Miller, Mark D.

    1991-01-01

    PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.

  8. Parallel multilevel preconditioners

    SciTech Connect

    Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.

    1989-01-01

    In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.

  9. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Painter, J.; Hansen, C.

    1996-10-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.

  10. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  11. ASSEMBLY OF PARALLEL PLATES

    DOEpatents

    Groh, E.F.; Lennox, D.H.

    1963-04-23

    This invention is concerned with a rigid assembly of parallel plates in which keyways are stamped out along the edges of the plates and a self-retaining key is inserted into aligned keyways. Spacers having similar keyways are included between adjacent plates. The entire assembly is locked into a rigid structure by fastening only the outermost plates to the ends of the keys. (AEC)

  12. Adaptive parallel logic networks

    SciTech Connect

    Martinez, T.R.; Vidal, J.J.

    1988-02-01

    This paper presents a novel class of special purpose processors referred to as ASOCS (adaptive self-organizing concurrent systems). Intended applications include adaptive logic devices, robotics, process control, system malfunction management, and in general, applications of logic reasoning. ASOCS combines massive parallelism with self-organization to attain a distributed mechanism for adaptation. The ASOCS approach is based on an adaptive network composed of many simple computing elements (nodes) which operate in a combinational and asynchronous fashion. Problem specification (programming) is obtained by presenting to the system if-then rules expressed as Boolean conjunctions. New rules are added incrementally. In the current model, when conflicts occur, precedence is given to the most recent inputs. With each rule, desired network response is simply presented to the system, following which the network adjusts itself to maintain consistency and parsimony of representation. Data processing and adaptation form two separate phases of operation. During processing, the network acts as a parallel hardware circuit. Control of the adaptive process is distributed among the network nodes and efficiently exploits parallelism.

  13. Trajectory optimization using parallel shooting method on parallel computer

    SciTech Connect

    Wirthman, D.J.; Park, S.Y.; Vadali, S.R.

    1995-03-01

    The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.

  14. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  15. Global Arrays Parallel Programming Toolkit

    SciTech Connect

    Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel

    2011-01-01

    The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving

  16. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  17. Resistor Combinations for Parallel Circuits.

    ERIC Educational Resources Information Center

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  18. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  19. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.

    1995-05-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.

  20. Parallel Eclipse Project Checkout

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

    2011-01-01

    Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any

  1. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.

  2. Fastpath Speculative Parallelization

    NASA Astrophysics Data System (ADS)

    Spear, Michael F.; Kelsey, Kirk; Bai, Tongxin; Dalessandro, Luke; Scott, Michael L.; Ding, Chen; Wu, Peng

    We describe Fastpath, a system for speculative parallelization of sequential programs on conventional multicore processors. Our system distinguishes between the lead thread, which executes at almost-native speed, and speculative threads, which execute somewhat slower. This allows us to achieve nontrivial speedup, even on two-core machines. We present a mathematical model of potential speedup, parameterized by application characteristics and implementation constants. We also present preliminary results gleaned from two different Fastpath implementations, each derived from an implementation of software transactional memory.

  3. CSM parallel structural methods research

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1989-01-01

    Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.

  4. Synchronous Parallel Kinetic Monte Carlo

    SciTech Connect

    Mart?nez, E; Marian, J; Kalos, M H

    2006-12-14

    A novel parallel kinetic Monte Carlo (kMC) algorithm formulated on the basis of perfect time synchronicity is presented. The algorithm provides an exact generalization of any standard serial kMC model and is trivially implemented in parallel architectures. We demonstrate the mathematical validity and parallel performance of the method by solving several well-understood problems in diffusion.

  5. Roo: A parallel theorem prover

    SciTech Connect

    Lusk, E.L.; McCune, W.W.; Slaney, J.K.

    1991-11-01

    We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.

  6. Programming parallel architectures - The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  7. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  8. Sequential and Parallel Algorithms for Spherical Interpolation

    NASA Astrophysics Data System (ADS)

    De Rossi, Alessandra

    2007-09-01

    Given a large set of scattered points on a sphere and their associated real values, we analyze sequential and parallel algorithms for the construction of a function defined on the sphere satisfying the interpolation conditions. The algorithms we implemented are based on a local interpolation method using spherical radial basis functions and the Inverse Distance Weighted method. Several numerical results show accuracy and efficiency of the algorithms.

  9. Imaging with parallel ray-rotation sheets.

    PubMed

    Hamilton, Alasdair C; Courtial, Johannes

    2008-12-01

    A ray-rotation sheet consists of miniaturized optical components that function--ray optically--as a homogeneous medium that rotates the local direction of transmitted light rays around the sheet normal by an arbitrary angle [A. C. Hamilton et al., arXiv:0809.2646 (2008)]. Here we show that two or more parallel ray-rotation sheets perform imaging between two planes. The image is unscaled and un-rotated. No other planes are imaged. When seen through parallel ray-rotation sheets, planes that are not imaged appear rotated. PMID:19065221

  10. Electron parallel closures for arbitrary collisionality

    SciTech Connect

    Ji, Jeong-Young Held, Eric D.

    2014-12-15

    Electron parallel closures for heat flow, viscosity, and friction force are expressed as kernel-weighted integrals of thermodynamic drives, the temperature gradient, relative electron-ion flow velocity, and flow-velocity gradient. Simple, fitted kernel functions are obtained for arbitrary collisionality from the 6400 moment solution and the asymptotic behavior in the collisionless limit. The fitted kernels circumvent having to solve higher order moment equations in order to close the electron fluid equations. For this reason, the electron parallel closures provide a useful and general tool for theoretical and computational models of astrophysical and laboratory plasmas.

  11. Constructing higher order DNA origami arrays using DNA junctions of anti-parallel/parallel double crossovers

    NASA Astrophysics Data System (ADS)

    Ma, Zhipeng; Park, Seongsu; Yamashita, Naoki; Kawai, Kentaro; Hirai, Yoshikazu; Tsuchiya, Toshiyuki; Tabata, Osamu

    2016-06-01

    DNA origami provides a versatile method for the construction of nanostructures with defined shape, size and other properties; such nanostructures may enable a hierarchical assembly of large scale architecture for the placement of other nanomaterials with atomic precision. However, the effective use of these higher order structures as functional components depends on knowledge of their assembly behavior and mechanical properties. This paper demonstrates construction of higher order DNA origami arrays with controlled orientations based on the formation of two types of DNA junctions: anti-parallel and parallel double crossovers. A two-step assembly process, in which preformed rectangular DNA origami monomer structures themselves undergo further self-assembly to form numerically unlimited arrays, was investigated to reveal the influences of assembly parameters. AFM observations showed that when parallel double crossover DNA junctions are used, the assembly of DNA origami arrays occurs with fewer monomers than for structures formed using anti-parallel double crossovers, given the same assembly parameters, indicating that the configuration of parallel double crossovers is not energetically preferred. However, the direct measurement by AFM force-controlled mapping shows that both DNA junctions of anti-parallel and parallel double crossovers have homogeneous mechanical stability with any part of DNA origami.

  12. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  13. Massively Parallel QCD

    SciTech Connect

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-04-11

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.

  14. Making parallel lines meet

    PubMed Central

    Baskin, Tobias I.; Gu, Ying

    2012-01-01

    The extracellular matrix is constructed beyond the plasma membrane, challenging mechanisms for its control by the cell. In plants, the cell wall is highly ordered, with cellulose microfibrils aligned coherently over a scale spanning hundreds of cells. To a considerable extent, deploying aligned microfibrils determines mechanical properties of the cell wall, including strength and compliance. Cellulose microfibrils have long been seen to be aligned in parallel with an array of microtubules in the cell cortex. How do these cortical microtubules affect the cellulose synthase complex? This question has stood for as many years as the parallelism between the elements has been observed, but now an answer is emerging. Here, we review recent work establishing that the link between microtubules and microfibrils is mediated by a protein named cellulose synthase-interacting protein 1 (CSI1). The protein binds both microtubules and components of the cellulose synthase complex. In the absence of CSI1, microfibrils are synthesized but their alignment becomes uncoupled from the microtubules, an effect that is phenocopied in the wild type by depolymerizing the microtubules. The characterization of CSI1 significantly enhances knowledge of how cellulose is aligned, a process that serves as a paradigmatic example of how cells dictate the construction of their extracellular environment. PMID:22902763

  15. Applied Parallel Metadata Indexing

    SciTech Connect

    Jacobi, Michael R

    2012-08-01

    The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

  16. Parallel ptychographic reconstruction

    PubMed Central

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-01-01

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174

  17. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  18. Parallel Algormiivls For Optical Digital Computers

    NASA Astrophysics Data System (ADS)

    Huang, Alan

    1983-04-01

    Conventional computers suffer from several communication bottlenecks which fundamentally limit their performance. These bottlenecks are characterized by an address-dependent sequential transfer of information which arises from the need to time-multiplex information over a limited number of interconnections. An optical digital computer based on a classical finite state machine can be shown to be free of these bottlenecks. Such a processor would be unique since it would be capable of modifying its entire state space each cycle while conventional computers can only alter a few bits. New algorithms are needed to manage and use this capability. A technique based on recognizing a particular symbol in parallel and replacing it in parallel with another symbol is suggested. Examples using this parallel symbolic substitution to perform binary addition and binary incrementation are presented. Applications involving Boolean logic, functional programming languages, production rule driven artificial intelligence, and molecular chemistry are also discussed.

  19. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  20. Grundy - Parallel processor architecture makes programming easy

    NASA Technical Reports Server (NTRS)

    Meier, R. J., Jr.

    1985-01-01

    The hardware, software, and firmware of the parallel processor, Grundy, are examined. The Grundy processor uses a simple processor that has a totally orthogonal three-address instruction set. The system contains a relative and indirect processing mode to support the high-level language, and uses pseudoprocessors and read-only memory. The system supports high-level language in which arbitrary degrees of algorithmic parallelism is expressed. The functions of the compiler and invocation frame are described. Grundy uses an operating system that can be accessed by an arbitrary number of processes simultaneously, and the access time grows only as the logarithm of the number of active processes. Applications for the parallel processor are discussed.

  1. Parallel Mechanisms for Visual Search in Zebrafish

    PubMed Central

    Proulx, Michael J.; Parker, Matthew O.; Tahir, Yasser; Brennan, Caroline H.

    2014-01-01

    Parallel visual search mechanisms have been reported previously only in mammals and birds, and not animals lacking an expanded telencephalon such as bees. Here we report the first evidence for parallel visual search in fish using a choice task where the fish had to find a target amongst an increasing number of distractors. Following two-choice discrimination training, zebrafish were presented with the original stimulus within an increasing array of distractor stimuli. We found that zebrafish exhibit no significant change in accuracy and approach latency as the number of distractors increased, providing evidence of parallel processing. This evidence challenges theories of vertebrate neural architecture and the importance of an expanded telencephalon for the evolution of executive function. PMID:25353168

  2. Extending HPF for advanced data parallel applications

    NASA Technical Reports Server (NTRS)

    Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

    1994-01-01

    The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.

  3. Loop parallelism on Tera MTA using SISAL

    SciTech Connect

    Mitrovic, S.

    1995-11-01

    The difficulty of programming parallel computers has impeded their wide-spread use. The problems are caused by existing hardware and software tools. The software problems on shared-memory and vector computers can be solved by using deterministic high-performance functional languages like SISAL. Distributed-memory computers have even more obstacles than shared-memory parallel machines. Research indicates that multithreaded architectures can hide long latency of distributed memories and that they can solve the problems of locality. Tera`s MTA multiprocessor is based on the concept of multithreading and provides the programmer with a real shared-memory model. This paper investigates the performance of parallel loops written in SISAL and executed on the Tera MTA using the Livermore Loops benchmarks.

  4. Parallel algorithms for optical digital computers

    SciTech Connect

    Huang, A.

    1983-01-01

    Conventional computers suffer from several communication bottlenecks which fundamentally limit their performance. These bottlenecks are characterised by an address-dependent sequential transfer of information which arises from the need to time-multiplex information over a limited number of interconnections. An optical digital computer based on a classical finite state machine can be shown to be free of these bottlenecks. Such a processor would be unique since it would be capable of modifying its entire state space each cycle while conventional computers can only alter a few bits. New algorithms are needed to manage and use this capability. A technique based on recognising a particular symbol in parallel and replacing it in parallel with another symbol is suggested. Examples using this parallel symbolic substitution to perform binary addition and binary incrementation are presented. Applications involving Boolean logic, functional programming languages, production rule driven artificial intelligence, and molecular chemistry are also discussed. 12 references.

  5. A systolic array parallelizing compiler

    SciTech Connect

    Tseng, P.S. )

    1990-01-01

    This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.

  6. Detecting opportunities for parallel observations on the Hubble Space Telescope

    NASA Technical Reports Server (NTRS)

    Lucks, Michael

    1992-01-01

    The presence of multiple scientific instruments aboard the Hubble Space Telescope provides opportunities for parallel science, i.e., the simultaneous use of different instruments for different observations. Determining whether candidate observations are suitable for parallel execution depends on numerous criteria (some involving quantitative tradeoffs) that may change frequently. A knowledge based approach is presented for constructing a scoring function to rank candidate pairs of observations for parallel science. In the Parallel Observation Matching System (POMS), spacecraft knowledge and schedulers' preferences are represented using a uniform set of mappings, or knowledge functions. Assessment of parallel science opportunities is achieved via composition of the knowledge functions in a prescribed manner. The knowledge acquisition, and explanation facilities of the system are presented. The methodology is applicable to many other multiple criteria assessment problems.

  7. Parallel Computing in SCALE

    SciTech Connect

    DeHart, Mark D; Williams, Mark L; Bowman, Stephen M

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  8. Unified Parallel Software

    SciTech Connect

    McKay, Mike

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use of EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.

  9. Parallel Polarization State Generation

    PubMed Central

    She, Alan; Capasso, Federico

    2016-01-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813

  10. Unified Parallel Software

    Energy Science and Technology Software Center (ESTSC)

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use ofmore » EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.« less

  11. Parallel Polarization State Generation

    NASA Astrophysics Data System (ADS)

    She, Alan; Capasso, Federico

    2016-05-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  12. Parallel Polarization State Generation.

    PubMed

    She, Alan; Capasso, Federico

    2016-01-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813

  13. Parallel tridiagonal equation solvers

    NASA Technical Reports Server (NTRS)

    Stone, H. S.

    1974-01-01

    Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.

  14. Parallel Imaging Microfluidic Cytometer

    PubMed Central

    Ehrlich, Daniel J.; McKenna, Brian K.; Evans, James G.; Belkina, Anna C.; Denis, Gerald V.; Sherr, David; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of flow cytometry (FACS) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1-D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity and, (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in approximately 6–10 minutes, about 30-times the speed of most current FACS systems. In 1-D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times the sample throughput of CCD-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. PMID:21704835

  15. The parallel I/O architecture of the high performance storage system (HPSS). Revision 1

    SciTech Connect

    Watson, R.W.; Coyne, R.A.

    1995-04-01

    Datasets up to terabyte size and petabyte capacities have created a serious imbalance between I/O and storage system performance and system functionality. One promising approach is the use of parallel data transfer techniques for client access to storage, peripheral-to-peripheral transfers, and remote file transfers. This paper describes the parallel I/O architecture and mechanisms, Parallel Transport Protocol (PTP), parallel FTP, and parallel client Application Programming Interface (API) used by the High Performance Storage System (HPSS). Parallel storage integration issues with a local parallel file system are also discussed.

  16. Parallelizing OVERFLOW: Experiences, Lessons, Results

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.

    1999-01-01

    The computer code OVERFLOW is widely used in the aerodynamic community for the numerical solution of the Navier-Stokes equations. Current trends in computer systems and architectures are toward multiple processors and parallelism, including distributed memory. This report describes work that has been carried out by the author and others at Ames Research Center with the goal of parallelizing OVERFLOW using a variety of parallel architectures and parallelization strategies. This paper begins with a brief description of the OVERFLOW code. This description includes the basic numerical algorithm and some software engineering considerations. Next comes a description of a parallel version of OVERFLOW, OVERFLOW/PVM, using PVM (Parallel Virtual Machine). This parallel version of OVERFLOW uses the manager/worker style and is part of the standard OVERFLOW distribution. Then comes a description of a parallel version of OVERFLOW, OVERFLOW/MPI, using MPI (Message Passing Interface). This parallel version of OVERFLOW uses the SPMD (Single Program Multiple Data) style. Finally comes a discussion of alternatives to explicit message-passing in the context of parallelizing OVERFLOW.

  17. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  18. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  19. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  20. PMESH: A parallel mesh generator

    SciTech Connect

    Hardin, D.D.

    1994-10-21

    The Parallel Mesh Generation (PMESH) Project is a joint LDRD effort by A Division and Engineering to develop a unique mesh generation system that can construct large calculational meshes (of up to 10{sup 9} elements) on massively parallel computers. Such a capability will remove a critical roadblock to unleashing the power of massively parallel processors (MPPs) for physical analysis. PMESH will support a variety of LLNL 3-D physics codes in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics.

  1. Parallel computation with the force

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1985-01-01

    A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.

  2. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  3. A Programmable Preprocessor for Parallelizing Fortran-90

    SciTech Connect

    Rosing, Matthew; Yabusaki, Steven B.

    1999-07-01

    A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department of Energy. The tool provides the basic functionality of a traditional preprocessor where directives are embedded in a serial Fortran program and interpreted by the preprocessor to produce parallel Fortran code with MPI calls. The unique aspect of this work is that the user can make additions to, or modify, these directives. The directives reside in a preprocessor library and changes to this library can range from small changes to customize an existing library, to larger changes for porting a library, to completely replacing the library. The preprocessor is programmed with a library of directives written in a C-like language, called DL, that has added support for manipulating Fortran code fragments. The primary benefits to the user are twofold: It is fairly easy for any user to generate efficient, parallel code from Fortran-90 with embedded directives, and the long term viability of the user?s software is guaranteed. This is because the source code will always run on a serial machine (the directives are transparent to standard Fortran compilers), and the preprocessor library can be modified to work with different hardware and software environments. A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures. Performance of these programs is similar to programs explicitly written for a parallel machine. Binaries of the preprocessor core, as well as the preprocessor library source code used in our groundwater modeling codes are currently available.

  4. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  5. Parallel Adaptive Mesh Refinement

    SciTech Connect

    Diachin, L; Hornung, R; Plassmann, P; WIssink, A

    2005-03-04

    As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the

  6. Parallel execution model for Prolog

    SciTech Connect

    Fagin, B.S.

    1987-01-01

    One candidate language for parallel symbolic computing is Prolog. Numerous ways for executing Prolog in parallel have been proposed, but current efforts suffer from several deficiencies. Many cannot support fundamental types of concurrency in Prolog. Other models are of purely theoretical interest, ignoring implementation costs. Detailed simulation studies of execution models are scare; at present little is known about the costs and benefits of executing Prolog in parallel. In this thesis, a new parallel execution model for Prolog is presented: the PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR-parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented, and compilation to the PPP abstract instruction set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.

  7. Reordering computations for parallel execution

    NASA Technical Reports Server (NTRS)

    Adams, L.

    1985-01-01

    The computations are reordered in the SOR algorithm to maintain the same asymptotic rate of convergence as the rowwise ordering to obtain parallelism at different levels. A parallel program is written to illustrate these ideas and actual machines for implementation of this program are discussed.

  8. Parallelizing Monte Carlo with PMC

    SciTech Connect

    Rathkopf, J.A.; Jones, T.R.; Nessett, D.M.; Stanberry, L.C.

    1994-11-01

    PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described.

  9. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  10. Parallel contingency statistics with Titan.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2009-09-01

    This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.

  11. Hebbian learning in parallel and modular memories.

    PubMed

    Poon, C S; Shah, J V

    1998-02-01

    Many cognitive and sensorimotor functions in the brain involve parallel and modular memory subsystems that are adapted by activity-dependent Hebbian synaptic plasticity. This is in contrast to the multilayer perceptron model of supervised learning where sensory information is presumed to be integrated by a common pool of hidden units through backpropagation learning. Here we show that Hebbian learning in parallel and modular memories is more advantageous than backpropagation learning in lumped memories in two respects: it is computationally much more efficient and structurally much simpler to implement with biological neurons. Accordingly, we propose a more biologically relevant neural network model, called a tree-like perceptron, which is a simple modification of the multilayer perceptron model to account for the general neural architecture, neuronal specificity, and synaptic learning rule in the brain. The model features a parallel and modular architecture in which adaptation of the input-to-hidden connection follows either a Hebbian or anti-Hebbian rule depending on whether the hidden units are excitatory or inhibitory, respectively. The proposed parallel and modular architecture and implicit interplay between the types of synaptic plasticity and neuronal specificity are exhibited by some neocortical and cerebellar systems. PMID:9525034

  12. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  13. Parallel NPARC: Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Townsend, S. E.

    1996-01-01

    Version 3 of the NPARC Navier-Stokes code includes support for large-grain (block level) parallelism using explicit message passing between a heterogeneous collection of computers. This capability has the potential for significant performance gains, depending upon the block data distribution. The parallel implementation uses a master/worker arrangement of processes. The master process assigns blocks to workers, controls worker actions, and provides remote file access for the workers. The processes communicate via explicit message passing using an interface library which provides portability to a number of message passing libraries, such as PVM (Parallel Virtual Machine). A Bourne shell script is used to simplify the task of selecting hosts, starting processes, retrieving remote files, and terminating a computation. This script also provides a simple form of fault tolerance. An analysis of the computational performance of NPARC is presented, using data sets from an F/A-18 inlet study and a Rocket Based Combined Cycle Engine analysis. Parallel speedup and overall computational efficiency were obtained for various NPARC run parameters on a cluster of IBM RS6000 workstations. The data show that although NPARC performance compares favorably with the estimated potential parallelism, typical data sets used with previous versions of NPARC will often need to be reblocked for optimum parallel performance. In one of the cases studied, reblocking increased peak parallel speedup from 3.2 to 11.8.

  14. Parallel incremental compilation. Doctoral thesis

    SciTech Connect

    Gafter, N.M.

    1990-06-01

    The time it takes to compile a large program has been a bottleneck in the software development process. When an interactive programming environment with an incremental compiler is used, compilation speed becomes even more important, but existing incremental compilers are very slow for some types of program changes. We describe a set of techniques that enable incremental compilation to exploit fine-grained concurrency in a shared-memory multi-processor and achieve asymptotic improvement over sequential algorithms. Because parallel non-incremental compilation is a special case of parallel incremental compilation, the design of a parallel compiler is a corollary of our result. Instead of running the individual phases concurrently, our design specifies compiler phases that are mutually sequential. However, each phase is designed to exploit fine-grained parallelism. By allowing each phase to present its output as a complete structure rather than as a stream of data, we can apply techniques such as parallel prefix and parallel divide-and-conquer, and we can construct applicative data structures to achieve sublinear execution time. Parallel algorithms for each phase of a compiler are presented to demonstrate that a complete incremental compiler can achieve execution time that is asymptotically less than sequential algorithms.

  15. EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS

    SciTech Connect

    F. PETRINI; W. FENG

    1999-09-01

    We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

  16. Parallel integer sorting with medium and fine-scale parallelism

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  17. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  18. Parallel coupled perturbed CASSCF equations and analytic CASSCF second derivatives.

    PubMed

    Dudley, Timothy J; Olson, Ryan M; Schmidt, Michael W; Gordon, Mark S

    2006-02-01

    A parallel algorithm for solving the coupled-perturbed MCSCF (CPMCSCF) equations and analytic nuclear second derivatives of CASSCF wave functions is presented. A parallel scheme for evaluating derivative integrals and their subsequent use in constructing other derivative quantities is described. The task of solving the CPMCSCF equations is approached using a parallelization scheme that partitions the electronic hessian matrix over all processors as opposed to simple partitioning of the 3 N solution vectors among the processors. The scalability of the current algorithm, up to 128 processors, is demonstrated. Using three test cases, results indicate that the parallelization of derivative integral evaluation through a simple scheme is highly effective regardless of the size of the basis set employed in the CASSCF energy calculation. Parallelization of the construction of the MCSCF electronic hessian during solution of the CPMCSCF equations varies quantitatively depending on the nature of the hessian itself, but is highly scalable in all cases. PMID:16365869

  19. Parallel Architecture For Robotics Computation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1990-01-01

    Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.

  20. Multigrid on massively parallel architectures

    SciTech Connect

    Falgout, R D; Jones, J E

    1999-09-17

    The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.

  1. Parallel inverse iteration with reorthogonalization

    SciTech Connect

    Fann, G.I.; Littlefield, R.J.

    1993-03-01

    A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK's current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN's and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.

  2. Parallel inverse iteration with reorthogonalization

    SciTech Connect

    Fann, G.I.; Littlefield, R.J.

    1993-03-01

    A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK`s current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN`s and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.

  3. Hydrologic Terrain Processing Using Parallel Computing

    NASA Astrophysics Data System (ADS)

    Tarboton, D. G.; Watson, D. W.; Wallace, R. M.; Schreuders, K.; Tesfa, T. K.

    2009-12-01

    Topography in the form of Digital Elevation Models (DEMs), is widely used to derive information for the modeling of hydrologic processes. Hydrologic terrain analysis augments the information content of digital elevation data by removing spurious pits, deriving a structured flow field, and calculating surfaces of hydrologic information derived from the flow field. The increasing availability of high-resolution terrain datasets for large areas poses a challenge for existing algorithms that process terrain data to extract this hydrologic information. This paper will describe parallel algorithms that have been developed to enhance hydrologic terrain pre-processing so that larger datasets can be more efficiently computed. Message Passing Interface (MPI) parallel implementations have been developed for pit removal, flow direction, and generalized flow accumulation methods within the Terrain Analysis Using Digital Elevation Models (TauDEM) package. The parallel algorithm works by decomposing the domain into striped or tiled data partitions where each tile is processed by a separate processor. This method also reduces the memory requirements of each processor so that larger size grids can be processed. The parallel pit removal algorithm is adapted from the method of Planchon and Darboux that starts from a high elevation then progressively scans the grid, lowering each grid cell to the maximum of the original elevation or the lowest neighbor. The MPI implementation reconciles elevations along process domain edges after each scan. Generalized flow accumulation extends flow accumulation approaches commonly available in GIS through the integration of multiple inputs and a broad class of algebraic rules into the calculation of flow related quantities. It is based on establishing a flow field through DEM grid cells, that is then used to evaluate any mathematical function that incorporates dependence on values of the quantity being evaluated at upslope (or downslope) grid cells

  4. Appendix E: Parallel Pascal development system

    NASA Technical Reports Server (NTRS)

    1985-01-01

    The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.

  5. Massively parallel sequencing and rare disease

    PubMed Central

    Ng, Sarah B.; Nickerson, Deborah A.; Bamshad, Michael J.; Shendure, Jay

    2010-01-01

    Massively parallel sequencing has enabled the rapid, systematic identification of variants on a large scale. This has, in turn, accelerated the pace of gene discovery and disease diagnosis on a molecular level and has the potential to revolutionize methods particularly for the analysis of Mendelian disease. Using massively parallel sequencing has enabled investigators to interrogate variants both in the context of linkage intervals and also on a genome-wide scale, in the absence of linkage information entirely. The primary challenge now is to distinguish between background polymorphisms and pathogenic mutations. Recently developed strategies for rare monogenic disorders have met with some early success. These strategies include filtering for potential causal variants based on frequency and function, and also ranking variants based on conservation scores and predicted deleteriousness to protein structure. Here, we review the recent literature in the use of high-throughput sequence data and its analysis in the discovery of causal mutations for rare disorders. PMID:20846941

  6. Parallel software tools at Langley Research Center

    NASA Technical Reports Server (NTRS)

    Moitra, Stuti; Tennille, Geoffrey M.; Lakeotes, Christopher D.; Randall, Donald P.; Arthur, Jarvis J.; Hammond, Dana P.; Mall, Gerald H.

    1993-01-01

    This document gives a brief overview of parallel software tools available on the Intel iPSC/860 parallel computer at Langley Research Center. It is intended to provide a source of information that is somewhat more concise than vendor-supplied material on the purpose and use of various tools. Each of the chapters on tools is organized in a similar manner covering an overview of the functionality, access information, how to effectively use the tool, observations about the tool and how it compares to similar software, known problems or shortfalls with the software, and reference documentation. It is primarily intended for users of the iPSC/860 at Langley Research Center and is appropriate for both the experienced and novice user.

  7. Oxytocin: parallel processing in the social brain?

    PubMed

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. PMID:25912257

  8. Flexibility and Performance of Parallel File Systems

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1996-01-01

    As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.

  9. New NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

    1997-01-01

    NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

  10. Turbomachinery CFD on parallel computers

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

    1992-01-01

    The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.

  11. Predicting performance of parallel computations

    NASA Technical Reports Server (NTRS)

    Mak, Victor W.; Lundstrom, Stephen F.

    1990-01-01

    An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7 percent, with a standard deviation of 1.5 percent; the maximum error is about 10 percent.

  12. Parallel hierarchical method in networks

    NASA Astrophysics Data System (ADS)

    Malinochka, Olha; Tymchenko, Leonid

    2007-09-01

    This method of parallel-hierarchical Q-transformation offers new approach to the creation of computing medium - of parallel -hierarchical (PH) networks, being investigated in the form of model of neurolike scheme of data processing [1-5]. The approach has a number of advantages as compared with other methods of formation of neurolike media (for example, already known methods of formation of artificial neural networks). The main advantage of the approach is the usage of multilevel parallel interaction dynamics of information signals at different hierarchy levels of computer networks, that enables to use such known natural features of computations organization as: topographic nature of mapping, simultaneity (parallelism) of signals operation, inlaid cortex, structure, rough hierarchy of the cortex, spatially correlated in time mechanism of perception and training [5].

  13. "Feeling" Series and Parallel Resistances.

    ERIC Educational Resources Information Center

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  14. Demonstrating Forces between Parallel Wires.

    ERIC Educational Resources Information Center

    Baker, Blane

    2000-01-01

    Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)

  15. Parallel computation using limited resources

    SciTech Connect

    Sugla, B.

    1985-01-01

    This thesis addresses itself to the task of designing and analyzing parallel algorithms when the resources of processors, communication, and time are limited. The two parts of this thesis deal with multiprocessor systems and VLSI - the two important parallel processing environments that are prevalent today. In the first part a time-processor-communication tradeoff analysis is conducted for two kinds of problems - N input, 1 output, and N input, N output computations. In the class of problems of the second kind, the problem of prefix computation, an important problem due to the number of naturally occurring computations it can model, is studied. Finally, a general methodology is given for design of parallel algorithms that can be used to optimize a given design to a wide set of architectural variations. The second part of the thesis considers the design of parallel algorithms for the VLSI model of computation when the resource of time is severely restricted.

  16. Parallel algorithms for message decomposition

    SciTech Connect

    Teng, S.H.; Wang, B.

    1987-06-01

    The authors consider the deterministic and random parallel complexity (time and processor) of message decoding: an essential problem in communications systems and translation systems. They present an optimal parallel algorithm to decompose prefix-coded messages and uniquely decipherable-coded messages in O(n/P) time, using O(P) processors (for all P:1 less than or equal toPless than or equal ton/log n) deterministically as well as randomly on the weakest version of parallel random access machines in which concurrent read and concurrent write to a cell in the common memory are not allowed. This is done by reducing decoding to parallel finite-state automata simulation and the prefix sums.

  17. HEATR project: ATR algorithm parallelization

    NASA Astrophysics Data System (ADS)

    Deardorf, Catherine E.

    1998-09-01

    High Performance Computing (HPC) Embedded Application for Target Recognition (HEATR) is a project funded by the High Performance Computing Modernization Office through the Common HPC Software Support Initiative (CHSSI). The goal of CHSSI is to produce portable, parallel, multi-purpose, freely distributable, support software to exploit emerging parallel computing technologies and enable application of scalable HPC's for various critical DoD applications. Specifically, the CHSSI goal for HEATR is to provide portable, parallel versions of several existing ATR detection and classification algorithms to the ATR-user community to achieve near real-time capability. The HEATR project will create parallel versions of existing automatic target recognition (ATR) detection and classification algorithms and generate reusable code that will support porting and software development process for ATR HPC software. The HEATR Team has selected detection/classification algorithms from both the model- based and training-based (template-based) arena in order to consider the parallelization requirements for detection/classification algorithms across ATR technology. This would allow the Team to assess the impact that parallelization would have on detection/classification performance across ATR technology. A field demo is included in this project. Finally, any parallel tools produced to support the project will be refined and returned to the ATR user community along with the parallel ATR algorithms. This paper will review: (1) HPCMP structure as it relates to HEATR, (2) Overall structure of the HEATR project, (3) Preliminary results for the first algorithm Alpha Test, (4) CHSSI requirements for HEATR, and (5) Project management issues and lessons learned.

  18. Modelling Layer parallel stylolites

    NASA Astrophysics Data System (ADS)

    Koehn, Daniel; Pataki Rood, Daisy; Beaudoin, Nicolas

    2016-04-01

    We modeled the geometrical roughening of mainly layer-dominated stylolites in order to understand their structural evolution, to present an advanced classification of stylolite shapes and to relate this classification to chemical compaction and stylolite sealing capabilities. Our simulations show that layer-dominated stylolites can grow in three distinct stages, an initial slow nucleation, a fast layer-pinning phase and a final freezing stage if the layer dissolves completely during growth. Dissolution of the pinning layer and thus destruction of the compaction tracking capabilities is a function of the background noise in the rock and the dissolution rate of the layer itself. Low background noise needs a slower dissolving layer for pinning to be successful but produces flatter teeth than higher background noise. We present an advanced classification based on our simulations and separate stylolites into four classes: rectangular layer type, seismogram pinning type, suture/sharp peak type and simple wave-like type.

  19. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  20. Efficiency of parallel direct optimization.

    PubMed

    Janies, D A; Wheeler, W C

    2001-03-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. PMID:12240679

  1. The NICMOS Parallel Observing Program

    NASA Astrophysics Data System (ADS)

    McCarthy, Patrick

    2002-07-01

    We propose to manage the default set of pure parallels with NICMOS. Our experience with both our GO NICMOS parallel program and the public parallel NICMOS programs in cycle 7 prepared us to make optimal use of the parallel opportunities. The NICMOS G141 grism remains the most powerful survey tool for HAlpha emission-line galaxies at cosmologically interesting redshifts. It is particularly well suited to addressing two key uncertainties regarding the global history of star formation: the peak rate of star formation in the relatively unexplored but critical 1<= z <= 2 epoch, and the amount of star formation missing from UV continuum-based estimates due to high extinction. Our proposed deep G141 exposures will increase the sample of known HAlpha emission- line objects at z ~ 1.3 by roughly an order of magnitude. We will also obtain a mix of F110W and F160W images along random sight-lines to examine the space density and morphologies of the reddest galaxies. The nature of the extremely red galaxies remains unclear and our program of imaging and grism spectroscopy provides unique information regarding both the incidence of obscured star bursts and the build up of stellar mass at intermediate redshifts. In addition to carrying out the parallel program we will populate a public database with calibrated spectra and images, and provide limited ground- based optical and near-IR data for the deepest parallel fields.

  2. Memory Scalability and Efficiency Analysis of Parallel Codes

    SciTech Connect

    Janjusic, Tommy; Kartsaklis, Christos

    2015-01-01

    Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an application s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).

  3. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  4. Nemesis I: Parallel Enhancements to ExodusII

    Energy Science and Technology Software Center (ESTSC)

    2006-03-28

    NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on filesmore » which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.« less

  5. Programming parallel architectures: The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  6. Chare kernel; A runtime support system for parallel computations

    SciTech Connect

    Shu, W. ); Kale, L.V. )

    1991-03-01

    This paper presents the chare kernel system, which supports parallel computations with irregular structure. The chare kernel is a collection of primitive functions that manage chares, manipulative messages, invoke atomic computations, and coordinate concurrent activities. Programs written in the chare kernel language can be executed on different parallel machines without change. Users writing such programs concern themselves with the creation of parallel actions but not with assigning them to specific processors. The authors describe the design and implementation of the chare kernel. Performance of chare kernel programs on two hypercube machines, the Intel iPSC/2 and the NCUBE, is also given.

  7. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-18

    Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  8. Parallel-distributed mobile robot simulator

    NASA Astrophysics Data System (ADS)

    Okada, Hiroyuki; Sekiguchi, Minoru; Watanabe, Nobuo

    1996-06-01

    The aim of this project is to achieve an autonomous learning and growth function based on active interaction with the real world. It should also be able to autonomically acquire knowledge about the context in which jobs take place, and how the jobs are executed. This article describes a parallel distributed movable robot system simulator with an autonomous learning and growth function. The autonomous learning and growth function which we are proposing is characterized by its ability to learn and grow through interaction with the real world. When the movable robot interacts with the real world, the system compares the virtual environment simulation with the interaction result in the real world. The system then improves the virtual environment to match the real-world result more closely. This the system learns and grows. It is very important that such a simulation is time- realistic. The parallel distributed movable robot simulator was developed to simulate the space of a movable robot system with an autonomous learning and growth function. The simulator constructs a virtual space faithful to the real world and also integrates the interfaces between the user, the actual movable robot and the virtual movable robot. Using an ultrafast CG (computer graphics) system (FUJITSU AG series), time-realistic 3D CG is displayed.

  9. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  10. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  11. Interface for Parallel I/O from Componentized Visualization Algorithms

    Energy Science and Technology Software Center (ESTSC)

    2008-09-16

    The software is an interface layer over file I/O with features specifically designed for efficient parallel reads and writes. The interface provides multiple concrete implementations that easily allow the replacement of one interface with another. This feature allows a reader or writer implementation to work independently of whether parallel file I/O is available or desired. The software also contains extensions to some readers to allow it to use the file I/O functionality.

  12. The economics of parallel trade.

    PubMed

    Danzon, P M

    1998-03-01

    The potential for parallel trade in the European Union (EU) has grown with the accession of low price countries and the harmonisation of registration requirements. Parallel trade implies a conflict between the principle of autonomy of member states to set their own pharmaceutical prices, the principle of free trade and the industrial policy goal of promoting innovative research and development (R&D). Parallel trade in pharmaceuticals does not yield the normal efficiency gains from trade because countries achieve low pharmaceutical prices by aggressive regulation, not through superior efficiency. In fact, parallel trade reduces economic welfare by undermining price differentials between markets. Pharmaceutical R&D is a global joint cost of serving all consumers worldwide; it accounts for roughly 30% of total costs. Optimal (welfare maximising) pricing to cover joint costs (Ramsey pricing) requires setting different prices in different markets, based on inverse demand elasticities. By contrast, parallel trade and regulation based on international price comparisons tend to force price convergence across markets. In response, manufacturers attempt to set a uniform 'euro' price. The primary losers from 'euro' pricing will be consumers in low income countries who will face higher prices or loss of access to new drugs. In the long run, even higher income countries are likely to be worse off with uniform prices, because fewer drugs will be developed. One policy option to preserve price differentials is to exempt on-patent products from parallel trade. An alternative is confidential contracting between individual manufacturers and governments to provide country-specific ex post discounts from the single 'euro' wholesale price, similar to rebates used by managed care in the US. This would preserve differentials in transactions prices even if parallel trade forces convergence of wholesale prices. PMID:10178655

  13. Kalman Filter Tracking on Parallel Architectures

    NASA Astrophysics Data System (ADS)

    Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2015-12-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter [2]. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.

  14. On the scalability of parallel genetic algorithms.

    PubMed

    Cantú-Paz, E; Goldberg, D E

    1999-01-01

    This paper examines the scalability of several types of parallel genetic algorithms (GAs). The objective is to determine the optimal number of processors that can be used by each type to minimize the execution time. The first part of the paper considers algorithms with a single population. The investigation focuses on an implementation where the population is distributed to several processors, but the results are applicable to more common master-slave implementations, where the population is entirely stored in a master processor and multiple slaves are used to evaluate the fitness. The second part of the paper deals with parallel GAs with multiple populations. It first considers a bounding case where the connectivity, the migration rate, and the frequency of migrations are set to their maximal values. Then, arbitrary regular topologies with lower migration rates are considered and the frequency of migrations is set to its lowest value. The investigationis mainly theoretical, but experimental evidence with an additively-decomposable function is included to illustrate the accuracy of the theory. In all cases, the calculations show that the optimal number of processors that minimizes the execution time is directly proportional to the square root of the population size and the fitness evaluation time. Since these two factors usually increase as the domain becomes more difficult, the results of the paper suggest that parallel GAs can integrate large numbers of processors and significantly reduce the execution time of many practical applications. PMID:10578030

  15. Parallel applications of the USNRC consolidated code

    NASA Astrophysics Data System (ADS)

    Gan, Jun; Downar, Thomas J.; Mahaffy, John H.; Uhle, Jennifer L.

    2001-07-01

    The United States Nuclear Regulatory Commission has developed the thermal-hydraulic analysis code TRAC-M to consolidate the capabilities of its suite of reactor safety analysis codes. One of the requirements for the new consolidated code is that it supports parallel computations to extend code functionality and to improve execution speed. A flexible request driven Exterior Communication Interface (ECI) was developed at Penn State University for use with the consolidated code and has enabled distributed parallel computing. This paper reports the application of TRAC-M and the ECI at Purdue University to a series of practical nuclear reactor problems. The performance of the consolidated code is studied on a shared memory machine, DEC Alpha 8400, in which a Large Break Loss of Coolant Accident (LBLOCA) analysis is applied for the safety analysis of the new generation reactor, AP600. The problem demonstrates the importance of balancing the computational for practical applications. Other computational platforms are also examined, to include the implementation of Linux and Windows OS on multiprocessor PCs. In general, the parallel performance on UNIX and Linux platforms is found to be the most stable and efficient.

  16. Parallel Stitching of Two-Dimensional Materials

    NASA Astrophysics Data System (ADS)

    Ling, Xi; Lin, Yuxuan; Dresselhaus, Mildred; Palacios, Tomás; Kong, Jing; Department of Electrical Engineering; Computer Science, Massachusetts Institute of Technology Team

    Large scale integration of atomically thin metals (e.g. graphene), semiconductors (e.g. transition metal dichalcogenides (TMDs)), and insulators (e.g. hexagonal boron nitride) is critical for constructing the building blocks for future nanoelectronics and nanophotonics. However, the construction of in-plane heterostructures, especially between two atomic layers with large lattice mismatch, could be extremely difficult due to the strict requirement of spatial precision and the lack of a selective etching method. Here, we developed a general synthesis methodology to achieve both vertical and in-plane ``parallel stitched'' heterostructures between a two-dimensional (2D) and TMD materials, which enables both multifunctional electronic/optoelectronic devices and their large scale integration. This is achieved via selective ``sowing'' of aromatic molecule seeds during the chemical vapor deposition growth. MoS2 is used as a model system to form heterostructures with diverse other 2D materials. Direct and controllable synthesis of large-scale parallel stitched graphene-MoS2 heterostructures was further investigated. Unique nanometer overlapped junctions were obtained at the parallel stitched interface, which are highly desirable both as metal-semiconductor contact and functional devices/systems, such as for use in logical integrated circuits (ICs) and broadband photodetectors.

  17. Parallel computation of radio listening rates

    NASA Astrophysics Data System (ADS)

    Mazzariol, Marc; Gennart, Benoit A.; Hersch, Roger D.; Gomez, Manuel; Balsiger, Peter; Pellandini, Fausto; Leder, Markus; Wuethrich, Daniel; Feitknecht, Juerg

    2000-10-01

    Obtaining the listening rates of radio stations in function of time is an important instrument for determining the impact of publicity. Since many radio stations are financed by publicity, the exact determination of radio listening rates is vital to their existence and to further development. Existing methods of determining radio listening rates are based on face to face interviews or telephonic interviews made with a sample population. These traditional methods however require the cooperation and compliance of the participants. In order to significantly improve the determination of radio listening rates, special watches were created which incorporate a custom integrated circuit sampling the ambient sound during a few seconds every minutes. Each watch accumulates these compressed sound samples during one full week. Watches are then sent to an evaluation center, where the sound samples are matched with the sound samples recorded from candidate radio stations. The present paper describes the processing steps necessary for computing the radio listening rates, and shows how this application was parallelized on a cluster of PCs using the CAP Computer-aided parallelization framework. Since the application must run in a production environment, the paper describes also the support provided for graceful degradation in case of transient or permanent failure of one of the system's components. The parallel sound matching server offers a linear speedup up to a large number of processing nodes thanks to the fact that disk access operations across the network are done in pipeline with computations.

  18. Parallel Quantum Circuit in a Tunnel Junction

    NASA Astrophysics Data System (ADS)

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian; GNS theory group Team

    In between 2 metallic nanopads, adding identical and independent electron transfer paths in parallel increases the electronic effective coupling between the 2 nanopads through the quantum circuit defined by those paths. Measuring this increase of effective coupling using the tunnelling current intensity can lead for example for 2 paths in parallel to the now standard G =G1 +G2 + 2√{G1 .G2 } conductance superposition law (1). This is only valid for the tunnelling regime (2). For large electronic coupling to the nanopads (or at resonance), G can saturate and even decay as a function of the number of parallel paths added in the quantum circuit (3). We provide here the explanation of this phenomenon: the measurement of the effective Rabi oscillation frequency using the current intensity is constrained by the normalization principle of quantum mechanics. This limits the quantum conductance G for example to go when there is only one channel per metallic nanopads. This ef fect has important consequences for the design of Boolean logic gates at the atomic scale using atomic scale or intramolecular circuits. References: This has the financial support by European PAMS project.

  19. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  20. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  1. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  2. PARAVT: Parallel Voronoi Tessellation code

    NASA Astrophysics Data System (ADS)

    Gonzalez, Roberto E.

    2016-01-01

    We present a new open source code for massive parallel computation of Voronoi tessellations(VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition take into account consistent boundary computation between tasks, and support periodic conditions. In addition, the code compute neighbors lists, Voronoi density and Voronoi cell volumes for each particle, and can compute density on a regular grid.

  3. Massively parallel MRI detector arrays

    NASA Astrophysics Data System (ADS)

    Keil, Boris; Wald, Lawrence L.

    2013-04-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays.

  4. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  5. Massively parallel MRI detector arrays.

    PubMed

    Keil, Boris; Wald, Lawrence L

    2013-04-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  6. Fast data parallel polygon rendering

    SciTech Connect

    Ortega, F.A.; Hansen, C.D.

    1993-09-01

    This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.

  7. Parallel integrated frame synchronizer chip

    NASA Technical Reports Server (NTRS)

    Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)

    2000-01-01

    A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.

  8. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  9. Visualizing Parallel Computer System Performance

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.

    1988-01-01

    Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

  10. Parallel transport of long mean-free-path plasma along open magnetic field lines: Parallel heat flux

    SciTech Connect

    Guo Zehua; Tang Xianzhu

    2012-06-15

    In a long mean-free-path plasma where temperature anisotropy can be sustained, the parallel heat flux has two components with one associated with the parallel thermal energy and the other the perpendicular thermal energy. Due to the large deviation of the distribution function from local Maxwellian in an open field line plasma with low collisionality, the conventional perturbative calculation of the parallel heat flux closure in its local or non-local form is no longer applicable. Here, a non-perturbative calculation is presented for a collisionless plasma in a two-dimensional flux expander bounded by absorbing walls. Specifically, closures of previously unfamiliar form are obtained for ions and electrons, which relate two distinct components of the species parallel heat flux to the lower order fluid moments such as density, parallel flow, parallel and perpendicular temperatures, and the field quantities such as the magnetic field strength and the electrostatic potential. The plasma source and boundary condition at the absorbing wall enter explicitly in the closure calculation. Although the closure calculation does not take into account wave-particle interactions, the results based on passing orbits from steady-state collisionless drift-kinetic equation show remarkable agreement with fully kinetic-Maxwell simulations. As an example of the physical implications of the theory, the parallel heat flux closures are found to predict a surprising observation in the kinetic-Maxwell simulation of the 2D magnetic flux expander problem, where the parallel heat flux of the parallel thermal energy flows from low to high parallel temperature region.

  11. Utilizing parallel optimization in computational fluid dynamics

    NASA Astrophysics Data System (ADS)

    Kokkolaras, Michael

    1998-12-01

    General problems of interest in computational fluid dynamics are investigated by means of optimization. Specifically, in the first part of the dissertation, a method of optimal incremental function approximation is developed for the adaptive solution of differential equations. Various concepts and ideas utilized by numerical techniques employed in computational mechanics and artificial neural networks (e.g. function approximation and error minimization, variational principles and weighted residuals, and adaptive grid optimization) are combined to formulate the proposed method. The basis functions and associated coefficients of a series expansion, representing the solution, are optimally selected by a parallel direct search technique at each step of the algorithm according to appropriate criteria; the solution is built sequentially. In this manner, the proposed method is adaptive in nature, although a grid is neither built nor adapted in the traditional sense using a-posteriori error estimates. Variational principles are utilized for the definition of the objective function to be extremized in the associated optimization problems, ensuring that the problem is well-posed. Complicated data structures and expensive remeshing algorithms and systems solvers are avoided. Computational efficiency is increased by using low-order basis functions and concurrent computing. Numerical results and convergence rates are reported for a range of steady-state problems, including linear and nonlinear differential equations associated with general boundary conditions, and illustrate the potential of the proposed method. Fluid dynamics applications are emphasized. Conclusions are drawn by discussing the method's limitations, advantages, and possible extensions. The second part of the dissertation is concerned with the optimization of the viscous-inviscid-interaction (VII) mechanism in an airfoil flow analysis code. The VII mechanism is based on the concept of a transpiration velocity

  12. Hybrid parallel programming with MPI and Unified Parallel C.

    SciTech Connect

    Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.

    2010-01-01

    The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.

  13. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  14. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-12-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.

  15. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-03-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

  16. ITER LHe Plants Parallel Operation

    NASA Astrophysics Data System (ADS)

    Fauve, E.; Bonneton, M.; Chalifour, M.; Chang, H.-S.; Chodimella, C.; Monneret, E.; Vincent, G.; Flavien, G.; Fabre, Y.; Grillot, D.

    The ITER Cryogenic System includes three identical liquid helium (LHe) plants, with a total average cooling capacity equivalent to 75 kW at 4.5 K.The LHe plants provide the 4.5 K cooling power to the magnets and cryopumps. They are designed to operate in parallel and to handle heavy load variations.In this proceedingwe will describe the presentstatusof the ITER LHe plants with emphasis on i) the project schedule, ii) the plantscharacteristics/layout and iii) the basic principles and control strategies for a stable operation of the three LHe plants in parallel.

  17. Medipix2 parallel readout system

    NASA Astrophysics Data System (ADS)

    Fanti, V.; Marzeddu, R.; Randaccio, P.

    2003-08-01

    A fast parallel readout system based on a PCI board has been developed in the framework of the Medipix collaboration. The readout electronics consists of two boards: the motherboard directly interfacing the Medipix2 chip, and the PCI board with digital I/O ports 32 bits wide. The device driver and readout software have been developed at low level in Assembler to allow fast data transfer and image reconstruction. The parallel readout permits a transfer rate up to 64 Mbytes/s. http://medipix.web.cern ch/MEDIPIX/

  18. Parallelization of the SIR code

    NASA Astrophysics Data System (ADS)

    Thonhofer, S.; Bellot Rubio, L. R.; Utz, D.; Jurčak, J.; Hanslmeier, A.; Piantschitsch, I.; Pauritsch, J.; Lemmerer, B.; Guttenbrunner, S.

    A high-resolution 3-dimensional model of the photospheric magnetic field is essential for the investigation of small-scale solar magnetic phenomena. The SIR code is an advanced Stokes-inversion code that deduces physical quantities, e.g. magnetic field vector, temperature, and LOS velocity, from spectropolarimetric data. We extended this code by the capability of directly using large data sets and inverting the pixels in parallel. Due to this parallelization it is now feasible to apply the code directly on extensive data sets. Besides, we included the possibility to use different initial model atmospheres for the inversion, which enhances the quality of the results.

  19. The design philosophy of the Chare kernel parallel programming system

    SciTech Connect

    Kale, L.V. )

    1989-01-01

    A variety of new parallel machines, some with global shared memory, some without, with few tens to few thousands of processor are emerging. How can one develop techniques and methods to program this bewildering variety of machines In this paper the authors propose a methodology for developing machine independent programs for all MIMD machines. They show that all common approaches to this problem - parallelizing compilers, high-level languages such as functional languages, and explicitly parallel languages - require a common base of support. This support can be encapsulated in a language that abstracts over resource management decisions and machine details. This language can then be used by implementors of other high level approaches to parallel programming as a universal and efficient back-end. It can also be used for efficient application programming. The requirement for such a language are defined, and the language supported by the chare kernel system is described as a satisfactory language for this purpose.

  20. Parallelizing AT with MatlabMPI

    SciTech Connect

    Li, Evan Y.; /Brown U. /SLAC

    2011-06-22

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  1. Parallel, Distributed Scripting with Python

    SciTech Connect

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.

  2. Coupled parallel waveguide semiconductor laser

    NASA Technical Reports Server (NTRS)

    Katz, J.; Kapon, E.; Lindsey, C.; Rav-Noy, Z.; Margalit, S.; Yariv, A.; Mukai, S.

    1984-01-01

    The operation of a new type of tunable laser, where the two separately controlled individual lasers are placed vertically in parallel, has been demonstrated. One of the cavities ('control' cavity) is operated below threshold and assists the longitudinal mode selection and tuning of the other laser. With a minor modification, the same device can operate as an independent two-wavelength laser source.

  3. File concepts for parallel I/O

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1989-01-01

    The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

  4. Cluster-based parallel image processing toolkit

    NASA Astrophysics Data System (ADS)

    Squyres, Jeffery M.; Lumsdaine, Andrew; Stevenson, Robert L.

    1995-03-01

    Many image processing tasks exhibit a high degree of data locality and parallelism and map quite readily to specialized massively parallel computing hardware. However, as network technologies continue to mature, workstation clusters are becoming a viable and economical parallel computing resource, so it is important to understand how to use these environments for parallel image processing as well. In this paper we discuss our implementation of parallel image processing software library (the Parallel Image Processing Toolkit). The Toolkit uses a message- passing model of parallelism designed around the Message Passing Interface (MPI) standard. Experimental results are presented to demonstrate the parallel speedup obtained with the Parallel Image Processing Toolkit in a typical workstation cluster over a wide variety of image processing tasks. We also discuss load balancing and the potential for parallelizing portions of image processing tasks that seem to be inherently sequential, such as visualization and data I/O.

  5. Parallel global optimization with the particle swarm algorithm

    PubMed Central

    Schutte, J. F.; Reinbolt, J. A.; Fregly, B. J.; Haftka, R. T.; George, A. D.

    2007-01-01

    SUMMARY Present day engineering optimization problems often impose large computational demands, resulting in long solution times even on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained parallelization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima—large-scale analytical test problems with computationally cheap function evaluations and medium-scale biomechanical system identification problems with computationally expensive function evaluations. For load-balanced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In contrast, for load-imbalanced biomechanical system identification problems with 12 design variables, speedup plateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the synchronization requirement of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical problems were solved using a fixed number of swarm iterations, a single population of 128 particles produced a better convergence rate than did multiple independent runs performed using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 particles). These results suggest that (1) parallel PSO exhibits excellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-life problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available. PMID:17891226

  6. HELENA code with incompressible parallel plasma flow

    NASA Astrophysics Data System (ADS)

    Throumoulopoulos, George; Poulipoulis, George; Konz, Christian; EFDA ITM-TF Team

    2015-11-01

    It has been established that plasma rotation in connection to both zonal and equilibrium flow can play a role in the transitions to the advanced confinement regimes in tokamaks, as the L-H transition and the formation of Internal Transport Barriers. For incompressible rotation the equilibrium is governed by a generalized Grad-Shafranov (GGS) equation and a decoupled Bernoulli-type equation for the pressure. For parallel flow the GGS equation can be transformed to one identical in form with the usual Grad-Shafranov equation. In the present study on the basis of the latter equation we have extended HELENA, an equilibrium fixed boundary solver integrated in the ITM-TF modeling infrastructure. The extended code solves the GGS equation for a variety of the two free-surface-function terms involved for arbitrary Afvén Mach functions. We have constructed diverted-boundary equilibria pertinent to ITER and examined their characteristics, in particular as concerns the impact of rotation. It turns out that the rotation affects noticeably the pressure and toroidal current density with the impact on the current density being stronger in the parallel direction than in the toroidal one. Also, the linear stability of the equilibria constructed is examined This work has been carried out within the framework of the EUROfuion Consortium and has received funding from the National Programme for the Controlled Thermonuclear Fusion, Hellenic Republic.

  7. Parallel tetrahedral mesh refinement with MOAB.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2008-12-01

    In this report, we present the novel functionality of parallel tetrahedral mesh refinement which we have implemented in MOAB. This report details work done to implement parallel, edge-based, tetrahedral refinement into MOAB. The theoretical basis for this work is contained in [PT04, PT05, TP06] while information on design, performance, and operation specific to MOAB are contained herein. As MOAB is intended mainly for use in pre-processing and simulation (as opposed to the post-processing bent of previous papers), the primary use case is different: rather than refining elements with non-linear basis functions, the goal is to increase the number of degrees of freedom in some region in order to more accurately represent the solution to some system of equations that cannot be solved analytically. Also, MOAB has a unique mesh representation which impacts the algorithm. This introduction contains a brief review of streaming edge-based tetrahedral refinement. The remainder of the report is broken into three sections: design and implementation, performance, and conclusions. Appendix A contains instructions for end users (simulation authors) on how to employ the refiner.

  8. Predicting mining activity with parallel genetic algorithms

    USGS Publications Warehouse

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  9. Low Mach number parallel and quasi-parallel shocks

    NASA Technical Reports Server (NTRS)

    Omidi, N.; Quest, K. B.; Winske, D.

    1990-01-01

    The properties of low-Mach-number parallel and quasi-parallel shocks are studied using the results of one-dimensional hybrid simulations. It is shown that both the structure and ion dissipation at the shocks differ considerably. In the parallel limit, the shock remains coupled to the piston and consists of large-amplitude magnetosonic-whistler waves in the upstream, through the shock and into the downstream region, where the waves eventually damp out. These waves are generated by an ion beam instability due to the interaction between the incident and piston-reflected ions. The excited waves decelerate the plasma sufficiently that it becomes stable far into the downstream. The increase in ion temperature along the shock normal in the downstream region is due to superposition of incident and piston-rflected ions. These two populations of ions remain distinct through the downstream region. While they are both gyrophase-bunched, their counterstreaming nature results in a 180-deg phase shift in their perpendicular velocities.

  10. Supporting data intensive applications with medium grained parallelism

    SciTech Connect

    Pfaltz, J.L.; French, J.C.; Grimshaw, A.S.; Son, S.H.

    1992-04-01

    ADAMS is an ambitious effort to provide new database access paradigms for the kinds of scientific applications that require massively parallel access to very large data sets in order to be effective. Many of the Grand Challenge Problems fall into this category, as well as those kinds of scientific research which depend on widely distributed shared sets of disparate data. The essence of the ADAMS approach is to view data purely in functional terms, rather than the more traditional structural view in which multiple data items are aggregated into records or tuples of flat files. Further, ADAMS has been implemented as an embedded interface so that scientists can develop applications in the host programming language of their choice, often Fortran, Pascal, or C, and still access shared data generated in other environments. The syntax and semantics of ADAMS is essentially complete. The functional nature of the ADAMS data interface paradigm simplifies its implementation in a distributed environment, e.g., the Mentat run-time system, because one must only distribute functional servers, not pieces of data structures. However, this only opens up the possibility of effective parallel database processing; to realize this potential far more work must be done in the areas of data dependence, intra-statement parallelism, parallel query optimization, and maintaining consistency and reliability in concurrent systems. Discovering how to make effective parallel data access an actually in real scientific applications is the point of this research.

  11. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  12. Two Level Parallel Grammatical Evolution

    NASA Astrophysics Data System (ADS)

    Ošmera, Pavel

    This paper describes a Two Level Parallel Grammatical Evolution (TLPGE) that can evolve complete programs using a variable length linear genome to govern the mapping of a Backus Naur Form grammar definition. To increase the efficiency of Grammatical Evolution (GE) the influence of backward processing was tested and a second level with differential evolution was added. The significance of backward coding (BC) and the comparison with standard coding of GEs is presented. The new method is based on parallel grammatical evolution (PGE) with a backward processing algorithm, which is further extended with a differential evolution algorithm. Thus a two-level optimization method was formed in attempt to take advantage of the benefits of both original methods and avoid their difficulties. Both methods used are discussed and the architecture of their combination is described. Also application is discussed and results on a real-word application are described.

  13. Template matching on parallel architectures

    SciTech Connect

    Sher

    1985-07-01

    Many important problems in computer vision can be characterized as template-matching problems on edge images. Some examples are circle detection and line detection. Two techniques for template matching are the Hough transform and correlation. There are two algorithms for correlation: a shift-and-add-based technique and a Fourier-transform-based technique. The most efficient algorithm of these three varies depending on the size of the template and the structure of the image. On different parallel architectures, the choice of algorithms for a specific problem is different. This paper describes two parallel architectures: the WARP and the Butterfly and describes why and how the criterion for making the choice of algorithms differs between the two machines.

  14. Parallel supercomputing with commodity components

    SciTech Connect

    Warren, M.S.; Goda, M.P.; Becker, D.J.

    1997-09-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  15. Parallel processing spacecraft communication system

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

    1998-01-01

    An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

  16. Parallel multiplex laser feedback interferometry

    SciTech Connect

    Zhang, Song; Tan, Yidong; Zhang, Shulian

    2013-12-15

    We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.

  17. A parallel graph coloring heuristic

    SciTech Connect

    Jones, M.T.; Plassmann, P.E. )

    1993-05-01

    The problem of computing good graph colorings arises in many diverse applications, such as in the estimation of sparse Jacobians and in the development of efficient, parallel iterative methods for solving sparse linear systems. This paper presents an asynchronous graph coloring heuristic well suited to distributed memory parallel computers. Experimental results obtained on an Intel iPSC/860 are presented, which demonstrate that, for graphs arising from finite element applications, the heuristic exhibits scalable performance and generates colorings usually within three or four colors of the best-known linear time sequential heuristics. For bounded degree graphs, it is shown that the expected running time of the heuristic under the P-Ram computation model is bounded by EO(log(n)/log log(n)). This bound is an improvement over the previously known best upper bound for the expected running time of a random heuristic for the graph coloring problem.

  18. Instruction-level parallel processing.

    PubMed

    Fisher, J A; Rau, R

    1991-09-13

    The performance of microprocessors has increased steadily over the past 20 years at a rate of about 50% per year. This is the cumulative result of architectural improvements as well as increases in circuit speed. Moreover, this improvement has been obtained in a transparent fashion, that is, without requiring programmers to rethink their algorithms and programs, thereby enabling the tremendous proliferation of computers that we see today. To continue this performance growth, microprocessor designers have incorporated instruction-level parallelism (ILP) into new designs. ILP utilizes the parallel execution ofthe lowest level computer operations-adds, multiplies, loads, and so on-to increase performance transparently. The use of ILP promises to make possible, within the next few years, microprocessors whose performance is many times that of a CRAY-IS. This article provides an overview of ILP, with an emphasis on ILP architectures-superscalar, VLIW, and dataflow processors-and the compiler techniques necessary to make ILP work well. PMID:17831442

  19. Parallel supercomputing with commodity components

    NASA Technical Reports Server (NTRS)

    Warren, M. S.; Goda, M. P.; Becker, D. J.

    1997-01-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  20. A new parallel simulation technique

    NASA Astrophysics Data System (ADS)

    Blanco-Pillado, Jose J.; Olum, Ken D.; Shlaer, Benjamin

    2012-01-01

    We develop a "semi-parallel" simulation technique suggested by Pretorius and Lehner, in which the simulation spacetime volume is divided into a large number of small 4-volumes that have only initial and final surfaces. Thus there is no two-way communication between processors, and the 4-volumes can be simulated independently and potentially at different times. This technique allows us to simulate much larger volumes than we otherwise could, because we are not limited by total memory size. No processor time is lost waiting for other processors. We compare a cosmic string simulation we developed using the semi-parallel technique with our previous MPI-based code for several test cases and find a factor of 2.6 improvement in the total amount of processor time required to accomplish the same job for strings evolving in the matter-dominated era.

  1. Scans as primitive parallel operations

    SciTech Connect

    Blelloch, G.E. . Dept. of Computer Science)

    1989-11-01

    In most parallel random access machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can execute in no more time than these parallel memory references. This paper outlines an extensive study of the effect of including, in the PRAM models, such scan operations as unit-time primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(log n) factor greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. The authors argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design. These all run on an EREW PRAM with the addition of two scan primitives.

  2. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  3. Efficient, massively parallel eigenvalue computation

    NASA Technical Reports Server (NTRS)

    Huo, Yan; Schreiber, Robert

    1993-01-01

    In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.

  4. Massively parallel femtosecond laser processing.

    PubMed

    Hasegawa, Satoshi; Ito, Haruyasu; Toyoda, Haruyoshi; Hayasaki, Yoshio

    2016-08-01

    Massively parallel femtosecond laser processing with more than 1000 beams was demonstrated. Parallel beams were generated by a computer-generated hologram (CGH) displayed on a spatial light modulator (SLM). The key to this technique is to optimize the CGH in the laser processing system using a scheme called in-system optimization. It was analytically demonstrated that the number of beams is determined by the horizontal number of pixels in the SLM NSLM that is imaged at the pupil plane of an objective lens and a distance parameter pd obtained by dividing the distance between adjacent beams by the diffraction-limited beam diameter. A performance limitation of parallel laser processing in our system was estimated at NSLM of 250 and pd of 7.0. Based on these parameters, the maximum number of beams in a hexagonal close-packed structure was calculated to be 1189 by using an analytical equation. PMID:27505815

  5. Highly parallel sparse Cholesky factorization

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

  6. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  7. Allinea DDT as a Parallel Debugging Alternative to Totalview

    SciTech Connect

    Antypas, K.B.

    2007-03-05

    Totalview, from the Etnus Corporation, is a sophisticated and feature rich software debugger for parallel applications. As Totalview has gained in popularity and market share its pricing model has increased to the point where it is often prohibitively expensive for massively parallel supercomputers. Additionally, many of Totalview's advanced features are not used by members of the scientific computing community. For these reasons, supercomputing centers have begun to search for a basic parallel debugging tool which can be used as an alternative to Totalview. As the cost and complexity of Totalview has increased over the years, scientific computing centers have started searching for a viable parallel debugging alternative. DDT (Distributed Debugging Tool) from Allinea Software is a relatively new parallel debugging tool which aims to provide much of the same functionality as Totalview. This review outlines the basic features and limitations of DDT to determine if it can be a reasonable substitute for Totalview. DDT was tested on the NERSC platforms Bassi, Seaborg, Jacquard and Davinci with Fortran90, C, and C++ codes using MPI and OpenMP for parallelism.

  8. Parallel micromanipulation method for microassembly

    NASA Astrophysics Data System (ADS)

    Sin, Jeongsik; Stephanou, Harry E.

    2001-09-01

    Microassembly deals with micron or millimeter scale objects where the tolerance requirements are in the micron range. Typical applications include electronics components (silicon fabricated circuits), optoelectronics components (photo detectors, emitters, amplifiers, optical fibers, microlenses, etc.), and MEMS (Micro-Electro-Mechanical-System) dies. The assembly processes generally require not only high precision but also high throughput at low manufacturing cost. While conventional macroscale assembly methods have been utilized in scaled down versions for microassembly applications, they exhibit limitations on throughput and cost due to the inherently serialized process. Since the assembly process depends heavily on the manipulation performance, an efficient manipulation method for small parts will have a significant impact on the manufacturing of miniaturized products. The objective of this study on 'parallel micromanipulation' is to achieve these three requirements through the handling of multiple small parts simultaneously (in parallel) with high precision (micromanipulation). As a step toward this objective, a new manipulation method is introduced. The method uses a distributed actuation array for gripper free and parallel manipulation, and a centralized, shared actuator for simplified controls. The method has been implemented on a testbed 'Piezo Active Surface (PAS)' in which an actively generated friction force field is the driving force for part manipulation. Basic motion primitives, such as translation and rotation of objects, are made possible with the proposed method. This study discusses the design of the proposed manipulation method PAS, and the corresponding manipulation mechanism. The PAS consists of two piezoelectric actuators for X and Y motion, two linear motion guides, two sets of nozzle arrays, and solenoid valves to switch the pneumatic suction force on and off in individual nozzles. One array of nozzles is fixed relative to the surface on

  9. Rochester checkers player: Multi-model parallel programming for animate vision. Technical report

    SciTech Connect

    Marsh, B.D.; Brown, C.M.; LeBlanc, T.J.; Scott, M.L.; Becker, T.G.

    1991-06-01

    Animate vision systems couple computer vision and robotics to achieve robust and accurate vision, as well as other complex behavior. These systems combine low-level sensory processing and effector output with high-level cognitive planning - all computationally intensive tasks that can benefit from parallel processing. No single model of parallel programming is likely to serve for all tasks, however. Early vision algorithms are intensely data parallel, often utilizing fine-grain parallel computations that share an image, while cognition algorithms decompose naturally by function, often consisting of loosely-coupled, coarse-grain parallel units. A typical animate vision application will likely consist of many tasks, each of which may require a different parallel programming model, and all of which must cooperate to achieve the desired behavior. These multi-model programs require an underlying software system that not only supports several different models of parallel computation simultaneously, but which also allows tasks implemented in different models to interact.

  10. The parallel I/O architecture of the High Performance Storage System (HPSS)

    SciTech Connect

    Watson, R.W.; Coyne, R.A.

    1995-02-01

    Rapid improvements in computational science, processing capability, main memory sizes, data collection devices, multimedia capabilities and integration of enterprise data are producing very large datasets (10s-100s of gigabytes to terabytes). This rapid growth of data has resulted in a serious imbalance in I/O and storage system performance and functionality. One promising approach to restoring balanced I/O and storage system performance is use of parallel data transfer techniques for client access to storage, device-to-device transfers, and remote file transfers. This paper describes the parallel I/O architecture and mechanisms, Parallel Transport Protocol, parallel FIP, and parallel client Application Programming Interface (API) used by the High Performance Storage System (HPSS). Parallel storage integration issues with a local parallel file system are also discussed.

  11. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  12. Parallel perfusion imaging processing using GPGPU

    PubMed Central

    Zhu, Fan; Gonzalez, David Rodriguez; Carpenter, Trevor; Atkinson, Malcolm; Wardlaw, Joanna

    2012-01-01

    Background and purpose The objective of brain perfusion quantification is to generate parametric maps of relevant hemodynamic quantities such as cerebral blood flow (CBF), cerebral blood volume (CBV) and mean transit time (MTT) that can be used in diagnosis of acute stroke. These calculations involve deconvolution operations that can be very computationally expensive when using local Arterial Input Functions (AIF). As time is vitally important in the case of acute stroke, reducing the analysis time will reduce the number of brain cells damaged and increase the potential for recovery. Methods GPUs originated as graphics generation dedicated co-processors, but modern GPUs have evolved to become a more general processor capable of executing scientific computations. It provides a highly parallel computing environment due to its large number of computing cores and constitutes an affordable high performance computing method. In this paper, we will present the implementation of a deconvolution algorithm for brain perfusion quantification on GPGPU (General Purpose Graphics Processor Units) using the CUDA programming model. We present the serial and parallel implementations of such algorithms and the evaluation of the performance gains using GPUs. Results Our method has gained a 5.56 and 3.75 speedup for CT and MR images respectively. Conclusions It seems that using GPGPU is a desirable approach in perfusion imaging analysis, which does not harm the quality of cerebral hemodynamic maps but delivers results faster than the traditional computation. PMID:22824549

  13. How to write fast and clear parallel programs using algebra

    SciTech Connect

    Stiller, L. Johns Hopkins Univ., Baltimore, MD )

    1992-01-01

    An algebraic method for the design of efficient and easy to port codes for parallel machines is described. The method was applied to speed up and to clarify certain communication functions, n-body codes, a biomolecular analysis, and a chess problem.

  14. How to write fast and clear parallel programs using algebra

    SciTech Connect

    Stiller, L. |

    1992-10-01

    An algebraic method for the design of efficient and easy to port codes for parallel machines is described. The method was applied to speed up and to clarify certain communication functions, n-body codes, a biomolecular analysis, and a chess problem.

  15. Implementing clips on a parallel computer

    NASA Technical Reports Server (NTRS)

    Riley, Gary

    1987-01-01

    The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.

  16. Parallelizing alternating direction implicit solver on GPUs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...

  17. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  18. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  19. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  20. High Performance Parallel Computational Nanotechnology

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to

  1. Scalable Parallel Algebraic Multigrid Solvers

    SciTech Connect

    Bank, R; Lu, S; Tong, C; Vassilevski, P

    2005-03-23

    The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.

  2. Fault-tolerant parallel processor

    SciTech Connect

    Harper, R.E.; Lala, J.H. )

    1991-06-01

    This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

  3. Parallel Assembly of LIGA Components

    SciTech Connect

    Christenson, T.R.; Feddema, J.T.

    1999-03-04

    In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.

  4. True Shear Parallel Plate Viscometer

    NASA Technical Reports Server (NTRS)

    Ethridge, Edwin; Kaukler, William

    2010-01-01

    This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.

  5. Exploring Parallel Concordancing in English and Chinese.

    ERIC Educational Resources Information Center

    Lixun, Wang

    2001-01-01

    Investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. A English-Chinese parallel corpus was created for use in parallel concordancing--a technique that has been developed to respond to the desire to study language in its natural contexts of use. (Author/VWL)

  6. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  7. Parallel Computation Of Forward Dynamics Of Manipulators

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1993-01-01

    Report presents parallel algorithms and special parallel architecture for computation of forward dynamics of robotics manipulators. Products of effort to find best method of parallel computation to achieve required computational efficiency. Significant speedup of computation anticipated as well as cost reduction.

  8. A parallel version of FORM 3

    NASA Astrophysics Data System (ADS)

    Fliegner, D.; Rétey, A.; Vermaseren, J. A. M.

    2001-08-01

    The parallel version of the symbolic manipulation program FORM for clusters of workstations and massive parallel systems is presented. We discuss various cluster architectures and the implementation of the parallel program using message passing (MPI). Performance results for real physics applications are shown.

  9. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    ERIC Educational Resources Information Center

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  10. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  11. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 17 Commodity and Securities Exchanges 1 2010-04-01 2010-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...

  12. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 17 Commodity and Securities Exchanges 1 2014-04-01 2014-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...

  13. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 17 Commodity and Securities Exchanges 1 2013-04-01 2013-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...

  14. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 17 Commodity and Securities Exchanges 1 2011-04-01 2011-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...

  15. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 17 Commodity and Securities Exchanges 1 2012-04-01 2012-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...

  16. Fuzzy controller design by parallel genetic algorithms

    NASA Astrophysics Data System (ADS)

    Mondelli, G.; Castellano, G.; Attolico, Giovanni; Distante, Arcangelo

    1998-03-01

    Designing a fuzzy system involves defining membership functions and constructing rules. Carrying out these two steps manually often results in a poorly performing system. Genetic Algorithms (GAs) has proved to be a useful tool for designing optimal fuzzy controller. In order to increase the efficiency and effectiveness of their application, parallel GAs (PAGs), evolving synchronously several populations with different balances between exploration and exploitation, have been implemented using a SIMD machine (APE100/Quadrics). The parameters to be identified are coded in such a way that the algorithm implicitly provides a compact fuzzy controller, by finding only necessary rules and removing useless inputs from them. Early results, working on a fuzzy controller implementing the wall-following task for a real vehicle as a test case, provided better fitness values in less generations with respect to previous experiments made using a sequential implementation of GAs.

  17. A parallel pipelined dataflow trigger processor

    SciTech Connect

    Lee, C.; Miller, G.; Kaplan, D.M.; Sa, J. ); Hsiung, Y.B. ); Carey, T.; Jeppesen, R. )

    1991-04-01

    This paper describes a parallel pipelined data flow trigger processor which is used in Fermilab E789. E789 is an experiment to study low-multiplicity decays of particles containing b or c quarks. The processor consists of an upstream vertex processor and a downstream track processor. The algorithms which reconstruct the postulated particle paths and calculate particle origin are implemented via interconnected function-specific hardware modules. The algorithm is directly dependent upon the organization of the modules, the specific arrangement of the inter-module cabling, on-board memory data. The processor provides an indication of the presence of at least one interesting particle pair in the current event by asserting Read on its Read/Skip output. The Read assertion is then used as a trigger to capture all of the event's data for subsequent extensive off-line analysis.

  18. Automating parallel implementation of neural learning algorithms.

    PubMed

    Rana, O F

    2000-06-01

    Neural learning algorithms generally involve a number of identical processing units, which are fully or partially connected, and involve an update function, such as a ramp, a sigmoid or a Gaussian function for instance. Some variations also exist, where units can be heterogeneous, or where an alternative update technique is employed, such as a pulse stream generator. Associated with connections are numerical values that must be adjusted using a learning rule, and and dictated by parameters that are learning rule specific, such as momentum, a learning rate, a temperature, amongst others. Usually, neural learning algorithms involve local updates, and a global interaction between units is often discouraged, except in instances where units are fully connected, or involve synchronous updates. In all of these instances, concurrency within a neural algorithm cannot be fully exploited without a suitable implementation strategy. A design scheme is described for translating a neural learning algorithm from inception to implementation on a parallel machine using PVM or MPI libraries, or onto programmable logic such as FPGAs. A designer must first describe the algorithm using a specialised Neural Language, from which a Petri net (PN) model is constructed automatically for verification, and building a performance model. The PN model can be used to study issues such as synchronisation points, resource sharing and concurrency within a learning rule. Specialised constructs are provided to enable a designer to express various aspects of a learning rule, such as the number and connectivity of neural nodes, the interconnection strategies, and information flows required by the learning algorithm. A scheduling and mapping strategy is then used to translate this PN model onto a multiprocessor template. We demonstrate our technique using a Kohonen and backpropagation learning rules, implemented on a loosely coupled workstation cluster, and a dedicated parallel machine, with PVM libraries

  19. Development of massively parallel quantum chemistry program SMASH

    SciTech Connect

    Ishimura, Kazuya

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  20. Local and nonlocal parallel heat transport in general magnetic fields

    SciTech Connect

    Del-Castillo-Negrete, Diego B; Chacon, Luis

    2011-01-01

    A novel approach for the study of parallel transport in magnetized plasmas is presented. The method avoids numerical pollution issues of grid-based formulations and applies to integrable and chaotic magnetic fields with local or nonlocal parallel closures. In weakly chaotic fields, the method gives the fractal structure of the devil's staircase radial temperature profile. In fully chaotic fields, the temperature exhibits self-similar spatiotemporal evolution with a stretched-exponential scaling function for local closures and an algebraically decaying one for nonlocal closures. It is shown that, for both closures, the effective radial heat transport is incompatible with the quasilinear diffusion model.

  1. Analysis of the Rotopod: An all revolute parallel manipulator

    SciTech Connect

    Schmitt, D.J.; Benavides, G.L.; Bieg, L.F.; Kozlowski, D.M.

    1998-05-16

    This paper introduces a new configuration of parallel manipulator call the Rotopod which is constructed from all revolute type joints. The Rotopod consists of two platforms connected by six legs and exhibits six Cartesian degrees of freedom. The Rotopod is initially compared with other all revolute joint parallel manipulators to show its similarities and differences. The inverse kinematics for this mechanism are developed and used to analyze the accessible workspace of the mechanism. Optimization is performed to determine the Rotopod design configurations which maximum the accessible workspace based on desirable functional constraints.

  2. Development of massively parallel quantum chemistry program SMASH

    NASA Astrophysics Data System (ADS)

    Ishimura, Kazuya

    2015-12-01

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C150H30)2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  3. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  4. Efficient parallel global garbage collection on massively parallel computers

    SciTech Connect

    Kamada, Tomio; Matsuoka, Satoshi; Yonezawa, Akinori

    1994-12-31

    On distributed-memory high-performance MPPs where processors are interconnected by an asynchronous network, efficient Garbage Collection (GC) becomes difficult due to inter-node references and references within pending, unprocessed messages. The parallel global GC algorithm (1) takes advantage of reference locality, (2) efficiently traverses references over nodes, (3) admits minimum pause time of ongoing computations, and (4) has been shown to scale up to 1024 node MPPs. The algorithm employs a global weight counting scheme to substantially reduce message traffic. The two methods for confirming the arrival of pending messages are used: one counts numbers of messages and the other uses network `bulldozing.` Performance evaluation in actual implementations on a multicomputer with 32-1024 nodes, Fujitsu AP1000, reveals various favorable properties of the algorithm.

  5. PREFACE: INERA Workshop: Transition Metal Oxide Thin Films-functional Layers in "Smart windows" and Water Splitting Devices. Parallel session of the 18th International School on Condensed Matter Physics

    NASA Astrophysics Data System (ADS)

    2014-11-01

    The Special issue presents the papers for the INERA Workshop entitled "Transition Metal Oxides as Functional Layers in Smart windows and Water Splitting Devices", which was held in Varna, St. Konstantin and Elena, Bulgaria, from the 4th-6th September 2014. The Workshop is organized within the context of the INERA "Research and Innovation Capacity Strengthening of ISSP-BAS in Multifunctional Nanostructures", FP7 Project REGPOT 316309 program, European project of the Institute of Solid State Physics at the Bulgarian Academy of Sciences. There were 42 participants at the workshop, 16 from Sweden, Germany, Romania and Hungary, 11 invited lecturers, and 28 young participants. There were researchers present from prestigious European laboratories which are leaders in the field of transition metal oxide thin film technologies. The event contributed to training young researchers in innovative thin film technologies, as well as thin films characterization techniques. The topics of the Workshop cover the field of technology and investigation of thin oxide films as functional layers in "Smart windows" and "Water splitting" devices. The topics are related to the application of novel technologies for the preparation of transition metal oxide films and the modification of chromogenic properties towards the improvement of electrochromic and termochromic device parameters for possible industrial deployment. The Workshop addressed the following topics: Metal oxide films-functional layers in energy efficient devices; Photocatalysts and chemical sensing; Novel thin film technologies and applications; Methods of thin films characterizations; From the 37 abstracts sent, 21 manuscripts were written and later refereed. We appreciate the comments from all the referees, and we are grateful for their valuable contributions. Guest Editors: Assoc. Prof. Dr.Tatyana Ivanova Prof. DSc Kostadinka Gesheva Prof. DSc Hassan Chamatti Assoc. Prof. Dr. Georgi Popkirov Workshop Organizing Committee Prof

  6. New Parallel computing framework for radiation transport codes

    SciTech Connect

    Kostin, M.A.; Mokhov, N.V.; Niita, K.; /JAERI, Tokai

    2010-09-01

    A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  7. A parallel execution model for Prolog

    SciTech Connect

    Fagin, B.

    1987-01-01

    In this thesis a new parallel execution model for Prolog is presented: The PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR- parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented and compilation to the PPP abstract instructions set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.

  8. Multi-prediction particle filter for efficient parallelized implementation

    NASA Astrophysics Data System (ADS)

    Chu, Chun-Yuan; Chao, Chih-Hao; Chao, Min-An; Wu, An-Yeu Andy

    2011-12-01

    Particle filter (PF) is an emerging signal processing methodology, which can effectively deal with nonlinear and non-Gaussian signals by a sample-based approximation of the state probability density function. The particle generation of the PF is a data-independent procedure and can be implemented in parallel. However, the resampling procedure in the PF is a sequential task in natural and difficult to be parallelized. Based on the Amdahl's law, the sequential portion of a task limits the maximum speed-up of the parallelized implementation. Moreover, large particle number is usually required to obtain an accurate estimation, and the complexity of the resampling procedure is highly related to the number of particles. In this article, we propose a multi-prediction (MP) framework with two selection approaches. The proposed MP framework can reduce the required particle number for target estimation accuracy, and the sequential operation of the resampling can be reduced. Besides, the overhead of the MP framework can be easily compensated by parallel implementation. The proposed MP-PF alleviates the global sequential operation by increasing the local parallel computation. In addition, the MP-PF is very suitable for multi-core graphics processing unit (GPU) platform, which is a popular parallel processing architecture. We give prototypical implementations of the MP-PFs on multi-core GPU platform. For the classic bearing-only tracking experiments, the proposed MP-PF can be 25.1 and 15.3 times faster than the sequential importance resampling-PF with 10,000 and 20,000 particles, respectively. Hence, the proposed MP-PF can enhance the efficiency of the parallelization.

  9. Parallelization strategy for large-scale vibronic coupling calculations.

    PubMed

    Rabidoux, Scott M; Eijkhout, Victor; Stanton, John F

    2014-12-26

    The vibronic coupling model of Köppel, Domcke, and Cederbaum is a powerful means to understand, predict, and analyze electronic spectra of molecules, especially those that exhibit phenomena that involve breakdown of the Born-Oppenheimer approximation. In this work, we describe a new parallel algorithm for carrying out such calculations. The algorithm is conceptually founded upon a "stencil" representation of the required computational steps, which motivates an efficient strategy for coarse-grained parallelization. The equations involved in the direct-CI type diagonalization of the model Hamiltonian are presented, the parallelization strategy is discussed in detail, and the method is illustrated by calculations involving direct-product basis sets with as many as 17 vibrational modes and 130 billion basis functions. PMID:25295469

  10. Adaptive, multiresolution visualization of large data sets using parallel octrees.

    SciTech Connect

    Freitag, L. A.; Loy, R. M.

    1999-06-10

    The interactive visualization and exploration of large scientific data sets is a challenging and difficult task; their size often far exceeds the performance and memory capacity of even the most powerful graphics work-stations. To address this problem, we have created a technique that combines hierarchical data reduction methods with parallel computing to allow interactive exploration of large data sets while retaining full-resolution capability. The hierarchical representation is built in parallel by strategically inserting field data into an octree data structure. We provide functionality that allows the user to interactively adapt the resolution of the reduced data sets so that resolution is increased in regions of interest without sacrificing local graphics performance. We describe the creation of the reduced data sets using a parallel octree, the software architecture of the system, and the performance of this system on the data from a Rayleigh-Taylor instability simulation.

  11. Object-Based Parallel Framework for Scientific Computing

    NASA Astrophysics Data System (ADS)

    Pierce, Brian; Omelchenko, Y. A.

    1999-11-01

    We have developed a library of software in Fortran 90 and MPI for running simulations on massively parallel facilities. This is modeled after Omelchenko's FLAME code which was written in C++. With Fortran 90 we found several advantages, such as the array syntax and the intrinsic functions. The parallel portion of this library is achieved by dividing the data into subdomains, and distributing the subdomains among the processors to be computed concurrently (with periodic updates in neighboring region information as is necessary). The library is flexible enough so that one can use it to run simulations on any number of processors, and the user can divide up the data between the processors in an arbitrary fashion. We have tested this library for correctness and speed by using it to conduct simulations on a parallel cluster at General Atomics and on a serial workstation.

  12. Conductance of quantum interference transistors in parallel and in series

    NASA Astrophysics Data System (ADS)

    Nikolić, K.; Nikolić, P.; Šordan, R.

    1999-07-01

    We theoretically study the electronic conductance G and the current-voltage characteristics of two quantum interference transistors in parallel and in series. We use two different definitions of conductance, G ˜ T and G ˜ T / R. Neither can reproduce the classical additivity law in the case of coherent transport due to quantum interference for the elements in series and quasibound states when elements are in parallel. In the case of two transistors in series, we find that the quantity T / R only qualitatively better represents the additivity law, which is probably expected because this model avoids counting the contact resistance twice. However, for the parallel configuration of transistors, the conductance is almost additive for the majority of energies when G ˜ T, except for the single-mode regime. Possible use of these configurations in digital electronics for basic logic functions is discussed.

  13. Parallel job scheduling policies to improve fairness : a case study.

    SciTech Connect

    Leung, Vitus Joseph; Sabin, Gerald; Sadayappan, Ponnuswamy

    2008-02-01

    Balancing fairness, user performance, and system performance is a critical concern when developing and installing parallel schedulers. Sandia uses a customized scheduler to manage many of their parallel machines. A primary function of the scheduler is to ensure that the machines have good utilization and that users are treated in a 'fair' manner. A separate compute process allocator (CPA) ensures that the jobs on the machines are not too fragmented in order to maximize throughput. Until recently, there has been no established technique to measure the fairness of parallel job schedulers. This paper introduces a 'hybrid' fairness metric that is similar to recently proposed metrics. The metric uses the Sandia version of a 'fairshare' queuing priority as the basis for fairness. The hybrid fairness metric is used to evaluate a Sandia workload. Using these results, multiple scheduling strategies are introduced to improve performance while satisfying user and system performance constraints.

  14. Parallel iterative solvers and preconditioners using approximate hierarchical methods

    SciTech Connect

    Grama, A.; Kumar, V.; Sameh, A.

    1996-12-31

    In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.

  15. Automating the parallel processing of fluid and structural dynamics calculations

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Cole, Gary L.

    1987-01-01

    The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilties to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.

  16. Automating the parallel processing of fluid and structural dynamics calculations

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Cole, Gary L.

    1987-01-01

    The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.

  17. Parallel algorithm of VLBI software correlator under multiprocessor environment

    NASA Astrophysics Data System (ADS)

    Zheng, Weimin; Zhang, Dong

    2007-11-01

    The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.

  18. Parallel garbage collection on a virtual memory system

    SciTech Connect

    Abraham, S.G.; Patel, J.H.

    1987-01-01

    Since most artificial intelligence applications are programmed in list processing languages, it is important to design architectures to support efficient garbage collection. This paper presents an architecture and an associated algorithm for parallel garbage collection on a virtual memory system. All the previously proposed parallel algorithms attempt to collect cells released by the list processor during the garbage collection cycle. We do not attempt to collect such cells. As a consequence, the list processor incurs little overhead in the proposed scheme, since it need not synchronize with the collector. Most parallel algorithms are designed for shared memory machines which have certain implicit synchronization functions on variable access. The proposed algorithm is designed for virtual memory systems where both the list processor and the garbage collector have private memories. The enforcement of coherence between the two private memories can be expensive and is not necessary in our scheme. 15 refs., 3 figs.

  19. QCMPI: A parallel environment for quantum computing

    NASA Astrophysics Data System (ADS)

    Tabakin, Frank; Juliá-Díaz, Bruno

    2009-06-01

    QCMPI is a quantum computer (QC) simulation package written in Fortran 90 with parallel processing capabilities. It is an accessible research tool that permits rapid evaluation of quantum algorithms for a large number of qubits and for various "noise" scenarios. The prime motivation for developing QCMPI is to facilitate numerical examination of not only how QC algorithms work, but also to include noise, decoherence, and attenuation effects and to evaluate the efficacy of error correction schemes. The present work builds on an earlier Mathematica code QDENSITY, which is mainly a pedagogic tool. In that earlier work, although the density matrix formulation was featured, the description using state vectors was also provided. In QCMPI, the stress is on state vectors, in order to employ a large number of qubits. The parallel processing feature is implemented by using the Message-Passing Interface (MPI) protocol. A description of how to spread the wave function components over many processors is provided, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors. These operators include the standard Pauli, Hadamard, CNOT and CPHASE gates and also Quantum Fourier transformation. These operators make up the actions needed in QC. Codes for Grover's search and Shor's factoring algorithms are provided as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to alternate noise effects, which corresponds to the idea of solving a stochastic Schrödinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its eigenvalues and associated entropy. Potential applications of this powerful tool include studies of the stability and correction of QC processes using Hamiltonian based dynamics. Program summaryProgram title: QCMPI Catalogue identifier: AECS_v1_0 Program summary URL

  20. Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine

    PubMed Central

    Tucker, Tracy; Marra, Marco; Friedman, Jan M.

    2009-01-01

    Massively parallel sequencing has reduced the cost and increased the throughput of genomic sequencing by more than three orders of magnitude, and it seems likely that costs will fall and throughput improve even more in the next few years. Clinical use of massively parallel sequencing will provide a way to identify the cause of many diseases of unknown etiology through simultaneous screening of thousands of loci for pathogenic mutations and by sequencing biological specimens for the genomic signatures of novel infectious agents. In addition to providing these entirely new diagnostic capabilities, massively parallel sequencing may also replace arrays and Sanger sequencing in clinical applications where they are currently being used. Routine clinical use of massively parallel sequencing will require higher accuracy, better ways to select genomic subsets of interest, and improvements in the functionality, speed, and ease of use of data analysis software. In addition, substantial enhancements in laboratory computer infrastructure, data storage, and data transfer capacity will be needed to handle the extremely large data sets produced. Clinicians and laboratory personnel will require training to use the sequence data effectively, and appropriate methods will need to be developed to deal with the incidental discovery of pathogenic mutations and variants of uncertain clinical significance. Massively parallel sequencing has the potential to transform the practice of medical genetics and related fields, but the vast amount of personal genomic data produced will increase the responsibility of geneticists to ensure that the information obtained is used in a medically and socially responsible manner. PMID:19679224

  1. A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

    DOE PAGESBeta

    Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; Shephard, Mark S.

    2013-01-01

    Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order inmore » a 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less

  2. Parallel heat transport in integrable and chaotic magnetic fields

    SciTech Connect

    Castillo-Negrete, D. del; Chacon, L.

    2012-05-15

    The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly challenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), {chi}{sub ||} , and the perpendicular, {chi}{sub Up-Tack }, conductivities ({chi}{sub ||} /{chi}{sub Up-Tack} may exceed 10{sup 10} in fusion plasmas); (ii) Nonlocal parallel transport in the limit of small collisionality; and (iii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates. Motivated by these issues, we present a Lagrangian Green's function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geometry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island), weakly chaotic (Devil's staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local parallel closures, is non-diffusive, thus casting doubts on the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.

  3. A parallel coupled oceanic-atmospheric general circulation model

    SciTech Connect

    Wehner, M.F.; Bourgeois, A.J.; Eltgroth, P.G.; Duffy, P.B.; Dannevik, W.P.

    1994-12-01

    The Climate Systems Modeling group at LLNL has developed a portable coupled oceanic-atmospheric general circulation model suitable for use on a variety of massively parallel (MPP) computers of the multiple instruction, multiple data (MIMD) class. The model is composed of parallel versions of the UCLA atmospheric general circulation model, the GFDL modular ocean model (MOM) and a dynamic sea ice model based on the Hiber formulation extracted from the OPYC ocean model. The strategy to achieve parallelism is twofold. One level of parallelism is accomplished by applying two dimensional domain decomposition techniques to each of the three constituent submodels. A second level of parallelism is attained by a concurrent execution of AGCM and OGCM/sea ice components on separate sets of processors. For this functional decomposition scheme, a flux coupling module has been written to calculate the heat, moisture and momentum fluxes independent of either the AGCM or the OGCM modules. The flux coupler`s other roles are to facilitate the transfer of data between subsystem components and processors via message passing techniques and to interpolate and aggregate between the possibly incommensurate meshes.

  4. FermiQCD: A tool kit for parallel lattice QCD applications

    SciTech Connect

    Di Pierro, M.

    2002-03-01

    We present here the most recent version of FermiQCD, a collection of C++ classes, functions and parallel algorithms for lattice QCD, based on Matrix Distributed Processing. FermiQCD allows fast development of parallel lattice applications and includes some SSE2 optimizations for clusters of Pentium 4 PCs.

  5. Information hiding in parallel programs

    SciTech Connect

    Foster, I.

    1992-01-30

    A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.

  6. Nanocapillary Adhesion between Parallel Plates.

    PubMed

    Cheng, Shengfeng; Robbins, Mark O

    2016-08-01

    Molecular dynamics simulations are used to study capillary adhesion from a nanometer scale liquid bridge between two parallel flat solid surfaces. The capillary force, Fcap, and the meniscus shape of the bridge are computed as the separation between the solid surfaces, h, is varied. Macroscopic theory predicts the meniscus shape and the contribution of liquid/vapor interfacial tension to Fcap quite accurately for separations as small as two or three molecular diameters (1-2 nm). However, the total capillary force differs in sign and magnitude from macroscopic theory for h ≲ 5 nm (8-10 diameters) because of molecular layering that is not included in macroscopic theory. For these small separations, the pressure tensor in the fluid becomes anisotropic. The components in the plane of the surface vary smoothly and are consistent with theory based on the macroscopic surface tension. Capillary adhesion is affected by only the perpendicular component, which has strong oscillations as the molecular layering changes. PMID:27413872

  7. Embodied and Distributed Parallel DJing.

    PubMed

    Cappelen, Birgitta; Andersson, Anders-Petter

    2016-01-01

    Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things. PMID:27534347

  8. Parallel discovery of Alzheimer's therapeutics.

    PubMed

    Lo, Andrew W; Ho, Carole; Cummings, Jayna; Kosik, Kenneth S

    2014-06-18

    As the prevalence of Alzheimer's disease (AD) grows, so do the costs it imposes on society. Scientific, clinical, and financial interests have focused current drug discovery efforts largely on the single biological pathway that leads to amyloid deposition. This effort has resulted in slow progress and disappointing outcomes. Here, we describe a "portfolio approach" in which multiple distinct drug development projects are undertaken simultaneously. Although a greater upfront investment is required, the probability of at least one success should be higher with "multiple shots on goal," increasing the efficiency of this undertaking. However, our portfolio simulations show that the risk-adjusted return on investment of parallel discovery is insufficient to attract private-sector funding. Nevertheless, the future cost savings of an effective AD therapy to Medicare and Medicaid far exceed this investment, suggesting that government funding is both essential and financially beneficial. PMID:24944190

  9. Parallel Network Simulations with NEURON

    PubMed Central

    Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

    2009-01-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488

  10. Self-testing in parallel

    NASA Astrophysics Data System (ADS)

    McKague, Matthew

    2016-04-01

    Self-testing allows us to determine, through classical interaction only, whether some players in a non-local game share particular quantum states. Most work on self-testing has concentrated on developing tests for small states like one pair of maximally entangled qubits, or on tests where there is a separate player for each qubit, as in a graph state. Here we consider the case of testing many maximally entangled pairs of qubits shared between two players. Previously such a test was shown where testing is sequential, i.e., one pair is tested at a time. Here we consider the parallel case where all pairs are tested simultaneously, giving considerably more power to dishonest players. We derive sufficient conditions for a self-test for many maximally entangled pairs of qubits shared between two players and also two constructions for self-tests where all pairs are tested simultaneously.

  11. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1991-01-01

    The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.

  12. Device for balancing parallel strings

    DOEpatents

    Mashikian, Matthew S.

    1985-01-01

    A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.

  13. Parallel computing in enterprise modeling.

    SciTech Connect

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  14. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated

  15. The ParaScope parallel programming environment

    NASA Technical Reports Server (NTRS)

    Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.

    1993-01-01

    The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.

  16. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2014-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  17. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2013-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Preliminary results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  18. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2015-11-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

  19. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  20. Parallel universes of Black Six biology

    PubMed Central

    2014-01-01

    Creation of lethal and synthetic lethal mutations in an experimental organism is a cornerstone of genetic dissection of gene function, and is related to the concept of an essential gene. Common inbred mouse strains carry background mutations, which can act as genetic modifiers, interfering with the assignment of gene essentiality. The inbred strain C57BL/6J, commonly known as “Black Six”, stands out, as it carries a spontaneous homozygous deletion in the nicotinamide nucleotide transhydrogenase (Nnt) gene [GenBank: AH009385.2], resulting in impairment of steroidogenic mitochondria of the adrenal gland, and a multitude of indirect modifier effects, coming from alteration of glucocorticoid-regulated processes. Over time, the popular strain has been used, by means of gene targeting technology, to assign “essential” and “redundant” qualifiers to numerous genes, thus creating an internally consistent “parallel universe” of knowledge. It is unrealistic to suggest phasing-out of this strain, given the scope of shared resources built around it, however, continuing on the road of “strain-unawareness” will result in profound waste of effort, particularly where translational research is concerned. The review analyzes the historical roots of this phenomenon and proposes that building of “parallel universes” should be urgently made visible to a critical reader by obligatory use of unambiguous and persistent tags in publications and databases, such as hypertext links, pointing to a vendor’s strain description web page, or to a digital object identifier (d.o.i.) of the original publication, so that any research done exclusively in C57BL/6J, could be easily identified. Reviewers This article was reviewed by Dr. Neil Smalheiser and Dr. Miguel Andrade-Navarro. PMID:25038798

  1. Parallel Quantum Circuit in a Tunnel Junction.

    PubMed

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-01-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N(2) power law is preserved for Ωab(N) and for Vab(N). PMID:27453262

  2. Parallel Quantum Circuit in a Tunnel Junction

    NASA Astrophysics Data System (ADS)

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-07-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N).

  3. Parallel optimization methods for agile manufacturing

    SciTech Connect

    Meza, J.C.; Moen, C.D.; Plantenga, T.D.; Spence, P.A.; Tong, C.H.; Hendrickson, B.A.; Leland, R.W.; Reese, G.M.

    1997-08-01

    The rapid and optimal design of new goods is essential for meeting national objectives in advanced manufacturing. Currently almost all manufacturing procedures involve the determination of some optimal design parameters. This process is iterative in nature and because it is usually done manually it can be expensive and time consuming. This report describes the results of an LDRD, the goal of which was to develop optimization algorithms and software tools that will enable automated design thereby allowing for agile manufacturing. Although the design processes vary across industries, many of the mathematical characteristics of the problems are the same, including large-scale, noisy, and non-differentiable functions with nonlinear constraints. This report describes the development of a common set of optimization tools using object-oriented programming techniques that can be applied to these types of problems. The authors give examples of several applications that are representative of design problems including an inverse scattering problem, a vibration isolation problem, a system identification problem for the correlation of finite element models with test data and the control of a chemical vapor deposition reactor furnace. Because the function evaluations are computationally expensive, they emphasize algorithms that can be adapted to parallel computers.

  4. Parallel Quantum Circuit in a Tunnel Junction

    PubMed Central

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-01-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N). PMID:27453262

  5. Berkeley Unified Parallel C (UPC) Runtime Library

    Energy Science and Technology Software Center (ESTSC)

    2003-03-31

    This software comprises a portable, open source implementation of a runtime library to support applications written in the Unified Parallel C (UPC) language. This library implements the UPC-specific functionality, including shared memory allocation and locks. The network-dependent functionality is implemented as a thin wrapper around a separate library implementing the GASNet (Global-Address Space Networking) specification. For true shared memory machines. GASNet is bypassed in favor of direct memory operations and local synchronization mechanisms. The Berkeleymore » UPC Runtime Library is currently the only implementation of the "Berkeley UPC Runtime Specification", and thus the only runtme library usable with the Berkeley UPC Compiler. Also, it is the only UPC runtime known to the author to provide two shared pointer representations: one for arbitrary blocksizes and one to optimize for the common cases of phaseless and blocksize=1. For distributed memory environments a library implementing the GASNet (Global-Address Space Networking) specification is required for communication. While no specialized hardware is required, a high-speed interconnet supported by the GASNet implementation is suggested for preformance. If no supported high-speed interconnect is available. GASNet can run over MPI. An external library is reqired for certain local memory allocation operations. A well defined interface allows for multiple implementations of this library, but at present the "umalloc" library from LBNL is the only compatible implementation.« less

  6. Multi-petascale highly efficient parallel supercomputer

    SciTech Connect

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  7. A generic fine-grained parallel C

    NASA Technical Reports Server (NTRS)

    Hamet, L.; Dorband, John E.

    1988-01-01

    With the present availability of parallel processors of vastly different architectures, there is a need for a common language interface to multiple types of machines. The parallel C compiler, currently under development, is intended to be such a language. This language is based on the belief that an algorithm designed around fine-grained parallelism can be mapped relatively easily to different parallel architectures, since a large percentage of the parallelism has been identified. The compiler generates a FORTH-like machine-independent intermediate code. A machine-dependent translator will reside on each machine to generate the appropriate executable code, taking advantage of the particular architectures. The goal of this project is to allow a user to run the same program on such machines as the Massively Parallel Processor, the CRAY, the Connection Machine, and the CYBER 205 as well as serial machines such as VAXes, Macintoshes and Sun workstations.

  8. Parallel automated adaptive procedures for unstructured meshes

    NASA Technical Reports Server (NTRS)

    Shephard, M. S.; Flaherty, J. E.; Decougny, H. L.; Ozturan, C.; Bottasso, C. L.; Beall, M. W.

    1995-01-01

    Consideration is given to the techniques required to support adaptive analysis of automatically generated unstructured meshes on distributed memory MIMD parallel computers. The key areas of new development are focused on the support of effective parallel computations when the structure of the numerical discretization, the mesh, is evolving, and in fact constructed, during the computation. All the procedures presented operate in parallel on already distributed mesh information. Starting from a mesh definition in terms of a topological hierarchy, techniques to support the distribution, redistribution and communication among the mesh entities over the processors is given, and algorithms to dynamically balance processor workload based on the migration of mesh entities are given. A procedure to automatically generate meshes in parallel, starting from CAD geometric models, is given. Parallel procedures to enrich the mesh through local mesh modifications are also given. Finally, the combination of these techniques to produce a parallel automated finite element analysis procedure for rotorcraft aerodynamics calculations is discussed and demonstrated.

  9. Design considerations for parallel graphics libraries

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  10. Parallel computing for probabilistic fatigue analysis

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

    1993-01-01

    This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.

  11. Runtime volume visualization for parallel CFD

    NASA Technical Reports Server (NTRS)

    Ma, Kwan-Liu

    1995-01-01

    This paper discusses some aspects of design of a data distributed, massively parallel volume rendering library for runtime visualization of parallel computational fluid dynamics simulations in a message-passing environment. Unlike the traditional scheme in which visualization is a postprocessing step, the rendering is done in place on each node processor. Computational scientists who run large-scale simulations on a massively parallel computer can thus perform interactive monitoring of their simulations. The current library provides an interface to handle volume data on rectilinear grids. The same design principles can be generalized to handle other types of grids. For demonstration, we run a parallel Navier-Stokes solver making use of this rendering library on the Intel Paragon XP/S. The interactive visual response achieved is found to be very useful. Performance studies show that the parallel rendering process is scalable with the size of the simulation as well as with the parallel computer.

  12. Algorithmically Specialized Parallel Architecture For Robotics

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1991-01-01

    Computing system called Robot Mathematics Processor (RMP) contains large number of processor elements (PE's) connected in various parallel and serial combinations reconfigurable via software. Special-purpose architecture designed for solving diverse computational problems in robot control, simulation, trajectory generation, workspace analysis, and like. System an MIMD-SIMD parallel architecture capable of exploiting parallelism in different forms and at several computational levels. Major advantage lies in design of cells, which provides flexibility and reconfigurability superior to previous SIMD processors.

  13. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  14. Six-Degree-Of-Freedom Parallel Minimanipulator

    NASA Technical Reports Server (NTRS)

    Tahmasebi, Farhad; Tsai, Lung-Wen

    1994-01-01

    Six-degree-of-freedom parallel minimanipulator stiffer and simpler than earlier six-degree-of-freedom manipulators. Includes only three inextensible limbs with universal joints at ends. Limbs have equal lengths and act in parallel as they share load on manipulated platform. Designed to provide high resolution and high stiffness for fine control of position and force in hybrid serial/parallel-manipulator system.

  15. Running Geant on T. Node parallel computer

    SciTech Connect

    Jejcic, A.; Maillard, J.; Silva, J. ); Mignot, B. )

    1990-08-01

    AnInmos transputer-based computer has been utilized to overcome the difficulties due to the limitations on the processing abilities of event parallelism and multiprocessor farms (i.e., the so called bus-crisis) and the concern regarding the growing sizes of databases typical in High Energy Physics. This study was done on the T.Node parallel computer manufactured by TELMAT. Detailed figures are reported concerning the event parallelization. (AIP)

  16. Racing in parallel: Quantum versus Classical

    NASA Astrophysics Data System (ADS)

    Steiger, Damian S.; Troyer, Matthias

    In a fair comparison of the performance of a quantum algorithm to a classical one it is important to treat them on equal footing, both regarding resource usage and parallelism. We show how one may otherwise mistakenly attribute speedup due to parallelism as quantum speedup. We apply such an analysis both to analog quantum devices (quantum annealers) and gate model algorithms and give several examples where a careful analysis of parallelism makes a significant difference in the comparison between classical and quantum algorithms.

  17. A parallel algorithm for global routing

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall J.; Banerjee, Prithviraj

    1990-01-01

    A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.

  18. Parallel execution and scriptability in micromagnetic simulations

    NASA Astrophysics Data System (ADS)

    Fischbacher, Thomas; Franchin, Matteo; Bordignon, Giuliano; Knittel, Andreas; Fangohr, Hans

    2009-04-01

    We demonstrate the feasibility of an "encapsulated parallelism" approach toward micromagnetic simulations that combines offering a high degree of flexibility to the user with the efficient utilization of parallel computing resources. While parallelization is obviously desirable to address the high numerical effort required for realistic micromagnetic simulations through utilizing now widely available multiprocessor systems (including desktop multicore CPUs and computing clusters), conventional approaches toward parallelization impose strong restrictions on the structure of programs: numerical operations have to be executed across all processors in a synchronized fashion. This means that from the user's perspective, either the structure of the entire simulation is rigidly defined from the beginning and cannot be adjusted easily, or making modifications to the computation sequence requires advanced knowledge in parallel programming. We explain how this dilemma is resolved in the NMAG simulation package in such a way that the user can utilize without any additional effort on his side both the computational power of multiple CPUs and the flexibility to tailor execution sequences for specific problems: simulation scripts written for single-processor machines can just as well be executed on parallel machines and behave in precisely the same way, up to increased speed. We provide a simple instructive magnetic resonance simulation example that demonstrates utilizing both custom execution sequences and parallelism at the same time. Furthermore, we show that this strategy of encapsulating parallelism even allows to benefit from speed gains through parallel execution in simulations controlled by interactive commands given at a command line interface.

  19. MMS Observations of Parallel Electric Fields

    NASA Astrophysics Data System (ADS)

    Ergun, R.; Goodrich, K.; Wilder, F. D.; Sturner, A. P.; Holmes, J.; Stawarz, J. E.; Malaspina, D.; Usanova, M.; Torbert, R. B.; Lindqvist, P. A.; Khotyaintsev, Y. V.; Burch, J. L.; Strangeway, R. J.; Russell, C. T.; Pollock, C. J.; Giles, B. L.; Hesse, M.; Goldman, M. V.; Drake, J. F.; Phan, T.; Nakamura, R.

    2015-12-01

    Parallel electric fields are a necessary condition for magnetic reconnection with non-zero guide field and are ultimately accountable for topological reconfiguration of a magnetic field. Parallel electric fields also play a strong role in charged particle acceleration and turbulence. The Magnetospheric Multiscale (MMS) mission targets these three universal plasma processes. The MMS satellites have an accurate three-dimensional electric field measurement, which can identify parallel electric fields as low as 1 mV/m at four adjacent locations. We present preliminary observations of parallel electric fields from MMS and provide an early interpretation of their impact on magnetic reconnection, in particular, where the topological change occurs. We also examine the role of parallel electric fields in particle acceleration. Direct particle acceleration by parallel electric fields is well established in the auroral region. Observations of double layers in by the Van Allan Probes suggest that acceleration by parallel electric fields may be significant in energizing some populations of the radiation belts. THEMIS observations also indicate that some of the largest parallel electric fields are found in regions of strong field-aligned currents associated with turbulence, suggesting a highly non-linear dissipation mechanism. We discuss how the MMS observations extend our understanding of the role of parallel electric fields in some of the most critical processes in the magnetosphere.

  20. A Parallel Multigrid Method for Neutronics Applications

    SciTech Connect

    Alcouffe, Raymond E.

    2001-01-01

    The multigrid method has been shown to be the most effective general method for solving the multi-dimensional diffusion equation encountered in neutronics. This being the method of choice, we develop a strategy for implementing the multigrid method on computers of massively parallel architecture. This leads us to strategies for parallelizing the relaxation, contraction (interpolation), and prolongation operators involved in the method. We then compare the efficiency of our parallel multigrid with other parallel methods for solving the diffusion equation on selected problems encountered in reactor physics.

  1. Conformal pure radiation with parallel rays

    NASA Astrophysics Data System (ADS)

    Leistner, Thomas; Nurowski, Paweł

    2012-03-01

    We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves.

  2. Parallel auto-correlative statistics with VTK.

    SciTech Connect

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  3. Remarks on parallel computations in MATLAB environment

    NASA Astrophysics Data System (ADS)

    Opalska, Katarzyna; Opalski, Leszek

    2013-10-01

    The paper attempts to summarize author's investigation of parallel computation capability of MATLAB environment in solving large ordinary differential equations (ODEs). Two MATLAB versions were tested and two parallelization techniques: one used multiple processors-cores, the other - CUDA compatible Graphics Processing Units (GPUs). A set of parameterized test problems was specially designed to expose different capabilities/limitations of the different variants of the parallel computation environment tested. Presented results illustrate clearly the superiority of the newer MATLAB version and, elapsed time advantage of GPU-parallelized computations for large dimensionality problems over the multiple processor-cores (with speed-up factor strongly dependent on the problem structure).

  4. A scalable 2-D parallel sparse solver

    SciTech Connect

    Kothari, S.C.; Mitra, S.

    1995-12-01

    Scalability beyond a small number of processors, typically 32 or less, is known to be a problem for existing parallel general sparse (PGS) direct solvers. This paper presents a parallel general sparse PGS direct solver for general sparse linear systems on distributed memory machines. The algorithm is based on the well-known sequential sparse algorithm Y12M. To achieve efficient parallelization, a 2-D scattered decomposition of the sparse matrix is used. The proposed algorithm is more scalable than existing parallel sparse direct solvers. Its scalability is evaluated on a 256 processor nCUBE2s machine using Boeing/Harwell benchmark matrices.

  5. Parallel computations and control of adaptive structures

    NASA Technical Reports Server (NTRS)

    Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

    1991-01-01

    The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.

  6. Applications of Parallel Processing to Astrodynamics

    NASA Astrophysics Data System (ADS)

    Coffey, S.; Healy, L.; Neal, H.

    1996-03-01

    Parallel processing is being used to improve the catalog of earth orbiting satellites and for problems associated with the catalog. Initial efforts centered around using SIMD parallel processors to perform debris conjunction analysis and satellite dynamics studies. More recently, the availability of cheap supercomputing processors and parallel processing software such as PVM have enabled the reutilization of existing astrodynamics software in distributed parallel processing environments, Computations once taking many days with traditional mainframes are now being performed in only a few hours. Efforts underway for the US Naval Space Command include conjunction prediction, uncorrelated target processing and a new space object catalog based on orbit determination and prediction with special perturbations methods.

  7. Parallel pivoting combined with parallel reduction and fill-in control

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita

    1989-01-01

    Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generation of fill-ins and check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel pivoting technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.

  8. Super-parallel MR microscope.

    PubMed

    Matsuda, Yoshimasa; Utsuzawa, Shin; Kurimoto, Takeaki; Haishi, Tomoyuki; Yamazaki, Yukako; Kose, Katsumi; Anno, Izumi; Marutani, Mitsuhiro

    2003-07-01

    A super-parallel MR microscope in which multiple (up to 100) samples can be imaged simultaneously at high spatial resolution is described. The system consists of a multichannel transmitter-receiver system and a gradient probe array housed in a large-bore magnet. An eight-channel MR microscope was constructed for verification of the system concept, and a four-channel MR microscope was constructed for a practical application. Eight chemically fixed mouse fetuses were simultaneously imaged at the 200 micro m(3) voxel resolution in a 1.5 T superconducting magnet of a whole-body MRI, and four chemically fixed human embryos were simultaneously imaged at 120 micro m(3) voxel resolution in a 2.35 T superconducting magnet. Although the spatial resolutions achieved were not strictly those of MR microscopy, the system design proposed here can be used to attain a much higher spatial resolution imaging of multiple samples, because higher magnetic field gradients can be generated at multiple positions in a homogeneous magnetic field. PMID:12815693

  9. Parallel processing in immune networks

    NASA Astrophysics Data System (ADS)

    Agliari, Elena; Barra, Adriano; Bartolucci, Silvia; Galluzzi, Andrea; Guerra, Francesco; Moauro, Francesco

    2013-04-01

    In this work, we adopt a statistical-mechanics approach to investigate basic, systemic features exhibited by adaptive immune systems. The lymphocyte network made by B cells and T cells is modeled by a bipartite spin glass, where, following biological prescriptions, links connecting B cells and T cells are sparse. Interestingly, the dilution performed on links is shown to make the system able to orchestrate parallel strategies to fight several pathogens at the same time; this multitasking capability constitutes a remarkable, key property of immune systems as multiple antigens are always present within the host. We also define the stochastic process ruling the temporal evolution of lymphocyte activity and show its relaxation toward an equilibrium measure allowing statistical-mechanics investigations. Analytical results are compared with Monte Carlo simulations and signal-to-noise outcomes showing overall excellent agreement. Finally, within our model, a rationale for the experimentally well-evidenced correlation between lymphocytosis and autoimmunity is achieved; this sheds further light on the systemic features exhibited by immune networks.

  10. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1995-01-01

    The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.

  11. Vectoring of parallel synthetic jets

    NASA Astrophysics Data System (ADS)

    Berk, Tim; Ganapathisubramani, Bharathram; Gomit, Guillaume

    2015-11-01

    A pair of parallel synthetic jets can be vectored by applying a phase difference between the two driving signals. The resulting jet can be merged or bifurcated and either vectored towards the actuator leading in phase or the actuator lagging in phase. In the present study, the influence of phase difference and Strouhal number on the vectoring behaviour is examined experimentally. Phase-locked vorticity fields, measured using Particle Image Velocimetry (PIV), are used to track vortex pairs. The physical mechanisms that explain the diversity in vectoring behaviour are observed based on the vortex trajectories. For a fixed phase difference, the vectoring behaviour is shown to be primarily influenced by pinch-off time of vortex rings generated by the synthetic jets. Beyond a certain formation number, the pinch-off timescale becomes invariant. In this region, the vectoring behaviour is determined by the distance between subsequent vortex rings. We acknowledge the financial support from the European Research Council (ERC grant agreement no. 277472).

  12. When the Lowest Energy Does Not Induce Native Structures: Parallel Minimization of Multi-Energy Values by Hybridizing Searching Intelligences

    PubMed Central

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708

  13. Particle Transport in Parallel-Plate Reactors

    SciTech Connect

    Rader, D.J.; Geller, A.S.

    1999-08-01

    A major cause of semiconductor yield degradation is contaminant particles that deposit on wafers while they reside in processing tools during integrated circuit manufacturing. This report presents numerical models for assessing particle transport and deposition in a parallel-plate geometry characteristic of a wide range of single-wafer processing tools: uniform downward flow exiting a perforated-plate showerhead separated by a gap from a circular wafer resting on a parallel susceptor. Particles are assumed to originate either upstream of the showerhead or from a specified position between the plates. The physical mechanisms controlling particle deposition and transport (inertia, diffusion, fluid drag, and external forces) are reviewed, with an emphasis on conditions encountered in semiconductor process tools (i.e., sub-atmospheric pressures and submicron particles). Isothermal flow is assumed, although small temperature differences are allowed to drive particle thermophoresis. Numerical solutions of the flow field are presented which agree with an analytic, creeping-flow expression for Re < 4. Deposition is quantified by use of a particle collection efficiency, which is defined as the fraction of particles in the reactor that deposit on the wafer. Analytic expressions for collection efficiency are presented for the limiting case where external forces control deposition (i.e., neglecting particle diffusion and inertia). Deposition from simultaneous particle diffusion and external forces is analyzed by an Eulerian formulation; for creeping flow and particles released from a planar trap, the analysis yields an analytic, integral expression for particle deposition based on process and particle properties. Deposition from simultaneous particle inertia and external forces is analyzed by a Lagrangian formulation, which can describe inertia-enhanced deposition resulting from particle acceleration in the showerhead. An approximate analytic expression is derived for particle

  14. A component analysis based on serial results analyzing performance of parallel iterative programs

    SciTech Connect

    Richman, S.C.

    1994-12-31

    This research is concerned with the parallel performance of iterative methods for solving large, sparse, nonsymmetric linear systems. Most of the iterative methods are first presented with their time costs and convergence rates examined intensively on sequential machines, and then adapted to parallel machines. The analysis of the parallel iterative performance is more complicated than that of serial performance, since the former can be affected by many new factors, such as data communication schemes, number of processors used, and Ordering and mapping techniques. Although the author is able to summarize results from data obtained after examining certain cases by experiments, two questions remain: (1) How to explain the results obtained? (2) How to extend the results from the certain cases to general cases? To answer these two questions quantitatively, the author introduces a tool called component analysis based on serial results. This component analysis is introduced because the iterative methods consist mainly of several basic functions such as linked triads, inner products, and triangular solves, which have different intrinsic parallelisms and are suitable for different parallel techniques. The parallel performance of each iterative method is first expressed as a weighted sum of the parallel performance of the basic functions that are the components of the method. Then, one separately examines the performance of basic functions and the weighting distributions of iterative methods, from which two independent sets of information are obtained when solving a given problem. In this component approach, all the weightings require only serial costs not parallel costs, and each iterative method for solving a given problem is represented by its unique weighting distribution. The information given by the basic functions is independent of iterative method, while that given by weightings is independent of parallel technique, parallel machine and number of processors.

  15. Parallel genotypic adaptation: when evolution repeats itself

    PubMed Central

    Wood, Troy E.; Burke, John M.; Rieseberg, Loren H.

    2008-01-01

    Until recently, parallel genotypic adaptation was considered unlikely because phenotypic differences were thought to be controlled by many genes. There is increasing evidence, however, that phenotypic variation sometimes has a simple genetic basis and that parallel adaptation at the genotypic level may be more frequent than previously believed. Here, we review evidence for parallel genotypic adaptation derived from a survey of the experimental evolution, phylogenetic, and quantitative genetic literature. The most convincing evidence of parallel genotypic adaptation comes from artificial selection experiments involving microbial populations. In some experiments, up to half of the nucleotide substitutions found in independent lineages under uniform selection are the same. Phylogenetic studies provide a means for studying parallel genotypic adaptation in non-experimental systems, but conclusive evidence may be difficult to obtain because homoplasy can arise for other reasons. Nonetheless, phylogenetic approaches have provided evidence of parallel genotypic adaptation across all taxonomic levels, not just microbes. Quantitative genetic approaches also suggest parallel genotypic evolution across both closely and distantly related taxa, but it is important to note that this approach cannot distinguish between parallel changes at homologous loci versus convergent changes at closely linked non-homologous loci. The finding that parallel genotypic adaptation appears to be frequent and occurs at all taxonomic levels has important implications for phylogenetic and evolutionary studies. With respect to phylogenetic analyses, parallel genotypic changes, if common, may result in faulty estimates of phylogenetic relationships. From an evolutionary perspective, the occurrence of parallel genotypic adaptation provides increasing support for determinism in evolution and may provide a partial explanation for how species with low levels of gene flow are held together. PMID:15881688

  16. Parallel processing and pipelining usher DSP model into the future

    SciTech Connect

    Kampen, T.V.; Anders, P.

    1986-02-20

    The course of digital signal processing is well plotted into the future. When standard microprocessors, constrained by their von Neumann architectures and weak arithmetic ability, proved inadequate for the task, the first specialized chips appeared. The devices were fortified with Harvard-like parallel architecture, multiplication-accumulation hardware, and instruction pipelines. In the next stage of their evolution, DSP chips will have to rely on faster IC technologies and even greater degrees of parallel operation. Two versions of such a DSP chip are planned, one with and one without data ROM and program memory, and the latter has been cast in silicon. The processor relies on high-speed CMOS technology and a parallel architecture to start an instruction every 125 ns. The chip has the processing power to handle many of the most sophisticated DSP algorithms needed in telecommunications, speech and image processing, and general industry applications. As a result, either version can replace multiple ICs in current designs, affording a single-chip solution that makes many applications practical for the first time. Moreover, its flexible I/O structure qualifies the chip for the multiple processor configurations that offer still more signal-processing power. Architecturally, twin 16-bit data buses, X and Y, connect five functional sections within the chip, all working in parallel. The sections include a 16-bit multiplier and 40-bit accumulator, an ALU teamed with a multiport register file, and combined data memory and address computation logic. Rounding out the chip's functional foundation are a versatile program control section and 16-bit serial and parallel I/O circuits.

  17. TSE computers - A means for massively parallel computations

    NASA Technical Reports Server (NTRS)

    Strong, J. P., III

    1976-01-01

    A description is presented of hardware concepts for building a massively parallel processing system for two-dimensional data. The processing system is to use logic arrays of 128 x 128 elements which perform over 16 thousand operations simultaneously. Attention is given to image data, logic arrays, basic image logic functions, a prototype negator, an interleaver device, image logic circuits, and an image memory circuit.

  18. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  19. Open parabosonic string theory between two parallel Dp-branes

    SciTech Connect

    Hamam, D.; Belaloui, N.

    2012-06-27

    We investigate an open parabosonic string theory between two parallel Dp-branes. The spectrum is constructed and the partition function is derived. A common chord between the development of this latter and the degeneracy of the states for each mass level is obtained. The theory is consistent and with no tachyon. The Virasoro algebra is derived and compared to the one of the ordinary case.

  20. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  1. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    NASA Astrophysics Data System (ADS)

    Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.

    2009-08-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  2. Parallel-End-Point Drafting Compass

    NASA Technical Reports Server (NTRS)

    Cronander, J.

    1986-01-01

    Parallelogram linkage ensures greater accuracy in drafting and scribing. Two members of arm of compass remain parallel for all angles pair makes with hub axis. They maintain opposing end members in parallelism. Parallelogram-linkage principle used on dividers as well as on compasses.

  3. EPIC: E-field Parallel Imaging Correlator

    NASA Astrophysics Data System (ADS)

    Thyagarajan, Nithyanandan; Beardsley, Adam P.; Bowman, Judd D.; Morales, Miguel F.

    2015-11-01

    E-field Parallel Imaging Correlator (EPIC), a highly parallelized Object Oriented Python package, implements the Modular Optimal Frequency Fourier (MOFF) imaging technique. It also includes visibility-based imaging using the software holography technique and a simulator for generating electric fields from a sky model. EPIC can accept dual-polarization inputs and produce images of all four instrumental cross-polarizations.

  4. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  5. MULTIOBJECTIVE PARALLEL GENETIC ALGORITHM FOR WASTE MINIMIZATION

    EPA Science Inventory

    In this research we have developed an efficient multiobjective parallel genetic algorithm (MOPGA) for waste minimization problems. This MOPGA integrates PGAPack (Levine, 1996) and NSGA-II (Deb, 2000) with novel modifications. PGAPack is a master-slave parallel implementation of a...

  6. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  7. Parallel algorithms for interactive manipulation of digital terrain models

    NASA Technical Reports Server (NTRS)

    Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

    1988-01-01

    Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.

  8. National Combustion Code: Parallel Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

    2000-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.

  9. Parallel language support on shared memory multiprocessors

    SciTech Connect

    Sah, A.

    1991-01-01

    The study of general purpose parallel computing requires efficient and inexpensive platforms for parallel program execution. This helps in ascertaining tradeoff choices between hardware complexity and software solutions for massively parallel systems design. In this paper, the authors present an implementation of an efficient parallel execution model on shared memory multiprocessors based on a Threaded Abstract Machine. The authors discuss a k-way generalized locking strategy suitable for our model. The authors study the performance gains obtained by a queuing strategy which uses multiple gueues with reduced access contention. The authors also present performance models in shared memory machines, related to lock contention and serialization in shared memory allocation. A bin-based memory management technique which reduces the serialization is presented. These issues are critical for obtaining an efficient parallel execution environment.

  10. Differences Between Distributed and Parallel Systems

    SciTech Connect

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  11. Parallel Algebraic Multigrid Methods - High Performance Preconditioners

    SciTech Connect

    Yang, U M

    2004-11-11

    The development of high performance, massively parallel computers and the increasing demands of computationally challenging applications have necessitated the development of scalable solvers and preconditioners. One of the most effective ways to achieve scalability is the use of multigrid or multilevel techniques. Algebraic multigrid (AMG) is a very efficient algorithm for solving large problems on unstructured grids. While much of it can be parallelized in a straightforward way, some components of the classical algorithm, particularly the coarsening process and some of the most efficient smoothers, are highly sequential, and require new parallel approaches. This chapter presents the basic principles of AMG and gives an overview of various parallel implementations of AMG, including descriptions of parallel coarsening schemes and smoothers, some numerical results as well as references to existing software packages.

  12. Configuration space representation in parallel coordinates

    NASA Technical Reports Server (NTRS)

    Fiorini, Paolo; Inselberg, Alfred

    1989-01-01

    By means of a system of parallel coordinates, a nonprojective mapping from R exp N to R squared is obtained for any positive integer N. In this way multivariate data and relations can be represented in the Euclidean plane (embedded in the projective plane). Basically, R squared with Cartesian coordinates is augmented by N parallel axes, one for each variable. The N joint variables of a robotic device can be represented graphically by using parallel coordinates. It is pointed out that some properties of the relation are better perceived visually from the parallel coordinate representation, and that new algorithms and data structures can be obtained from this representation. The main features of parallel coordinates are described, and an example is presented of their use for configuration space representation of a mechanical arm (where Cartesian coordinates cannot be used).

  13. A paradigm for parallel unstructured grid generation

    SciTech Connect

    Gaither, A.; Marcum, D.; Reese, D.

    1996-12-31

    In this paper, a sequential 2D unstructured grid generator based on iterative point insertion and local reconnection is coupled with a Delauney tessellation domain decomposition scheme to create a scalable parallel unstructured grid generator. The Message Passing Interface (MPI) is used for distributed communication in the parallel grid generator. This work attempts to provide a generic framework to enable the parallelization of fast sequential unstructured grid generators in order to compute grand-challenge scale grids for Computational Field Simulation (CFS). Motivation for moving from sequential to scalable parallel grid generation is presented. Delaunay tessellation and iterative point insertion and local reconnection (advancing front method only) unstructured grid generation techniques are discussed with emphasis on how these techniques can be utilized for parallel unstructured grid generation. Domain decomposition techniques are discussed for both Delauney and advancing front unstructured grid generation with emphasis placed on the differences needed for both grid quality and algorithmic efficiency.

  14. Broadcasting a message in a parallel computer

    DOEpatents

    Berg, Jeremy E.; Faraj, Ahmad A.

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  15. Randomized parallel speedups for list ranking

    SciTech Connect

    Vishkin, U.

    1987-06-01

    The following problem is considered: given a linked list of length n, compute the distance of each element of the linked list from the end of the list. The problem has two standard deterministic algorithms: a linear time serial algorithm, and an O(n log n)/ rho + log n) time parallel algorithm using rho processors. The authors present a randomized parallel algorithm for the problem. The algorithm is designed for an exclusive-read exclusive-write parallel random access machine (EREW PRAM). It runs almost surely in time O(n/rho + log n log* n) using rho processors. Using a recently published parallel prefix sums algorithm the list-ranking algorithm can be adapted to run on a concurrent-read concurrent-write parallel random access machine (CRCW PRAM) almost surely in time O(n/rho + log n) using rho processors.

  16. Implementation and performance of parallel Prolog interpreter

    SciTech Connect

    Wei, S.; Kale, L.V.; Balkrishna, R. . Dept. of Computer Science)

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  17. Instant well-log inversion with a parallel computer

    SciTech Connect

    Kimminau, S.J.; Trivedi, H.

    1993-08-01

    Well-log analysis requires several vectors of input data to be inverted with a physical model that produces more vectors of output data. The problem is inherently suited to either vectorization or parallelization. PLATO (parallel log analysis, timely output) is a research prototype system that uses a parallel architecture computer with memory-mapped graphics to invert vector data and display the result rapidly. By combining this high-performance computing and display system with a graphical user interface, the analyst can interact with the system in real time'' and can visualize the result of changing parameters on up to 1,000 levels of computed volumes and reconstructed logs. It is expected that such instant'' inversion will remove the main disadvantages frequently cited for simultaneous analysis methods, namely difficulty in assessing sensitivity to different parameters and slow output response. Although the prototype system uses highly specific features of a parallel processor, a subsequent version has been implemented on a conventional (Serial) workstation with less performance but adequate functionality to preserve the apparently instant response. PLATO demonstrates the feasibility of petroleum computing applications combining an intuitive graphical interface, high-performance computing of physical models, and real-time output graphics.

  18. A Portable MPI-Based Parallel Vector Template Library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  19. A portable MPI-based parallel vector template library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  20. Parallel implementation of electronic structure energy, gradient, and Hessian calculations.

    PubMed

    Lotrich, V; Flocke, N; Ponton, M; Yau, A D; Perera, A; Deumens, E; Bartlett, R J

    2008-05-21

    ACES III is a newly written program in which the computationally demanding components of the computational chemistry code ACES II [J. F. Stanton et al., Int. J. Quantum Chem. 526, 879 (1992); [ACES II program system, University of Florida, 1994] have been redesigned and implemented in parallel. The high-level algorithms include Hartree-Fock (HF) self-consistent field (SCF), second-order many-body perturbation theory [MBPT(2)] energy, gradient, and Hessian, and coupled cluster singles, doubles, and perturbative triples [CCSD(T)] energy and gradient. For SCF, MBPT(2), and CCSD(T), both restricted HF and unrestricted HF reference wave functions are available. For MBPT(2) gradients and Hessians, a restricted open-shell HF reference is also supported. The methods are programed in a special language designed for the parallelization project. The language is called super instruction assembly language (SIAL). The design uses an extreme form of object-oriented programing. All compute intensive operations, such as tensor contractions and diagonalizations, all communication operations, and all input-output operations are handled by a parallel program written in C and FORTRAN 77. This parallel program, called the super instruction processor (SIP), interprets and executes the SIAL program. By separating the algorithmic complexity (in SIAL) from the complexities of execution on computer hardware (in SIP), a software system is created that allows for very effective optimization and tuning on different hardware architectures with quite manageable effort. PMID:18500853

  1. Parallel implementation of electronic structure energy, gradient, and Hessian calculations

    NASA Astrophysics Data System (ADS)

    Lotrich, V.; Flocke, N.; Ponton, M.; Yau, A. D.; Perera, A.; Deumens, E.; Bartlett, R. J.

    2008-05-01

    ACES III is a newly written program in which the computationally demanding components of the computational chemistry code ACES II [J. F. Stanton et al., Int. J. Quantum Chem. 526, 879 (1992); [ACES II program system, University of Florida, 1994] have been redesigned and implemented in parallel. The high-level algorithms include Hartree-Fock (HF) self-consistent field (SCF), second-order many-body perturbation theory [MBPT(2)] energy, gradient, and Hessian, and coupled cluster singles, doubles, and perturbative triples [CCSD(T)] energy and gradient. For SCF, MBPT(2), and CCSD(T), both restricted HF and unrestricted HF reference wave functions are available. For MBPT(2) gradients and Hessians, a restricted open-shell HF reference is also supported. The methods are programed in a special language designed for the parallelization project. The language is called super instruction assembly language (SIAL). The design uses an extreme form of object-oriented programing. All compute intensive operations, such as tensor contractions and diagonalizations, all communication operations, and all input-output operations are handled by a parallel program written in C and FORTRAN 77. This parallel program, called the super instruction processor (SIP), interprets and executes the SIAL program. By separating the algorithmic complexity (in SIAL) from the complexities of execution on computer hardware (in SIP), a software system is created that allows for very effective optimization and tuning on different hardware architectures with quite manageable effort.

  2. Numerical analysis of electrical defibrillation. The parallel approach.

    PubMed

    Ng, K T; Hutchinson, S A; Gao, S

    1995-01-01

    Numerical modeling offers a viable tool for studying electrical defibrillation, allowing the behavior of field quantities to be observed easily as the different system parameters are varied. One numerical technique, namely the finite-element method, has been found particularly effective for modeling complex thoracic anatomies. However, an accurate finite-element model of the thorax often requires a large number of elements and nodes, leading to a large set of equations that cannot be solved effectively with the computational power of conventional computers. This is especially true if many finite-element solutions need to be achieved within a reasonable time period (eg, electrode configuration optimization). In this study, the use of massively parallel computers to provide the memory and reduction in solution time for solving these large finite-element problems is discussed. Both the uniform and unstructured grid approaches are considered. Algorithms that allow efficient mapping of uniform and unstructured grids to data-parallel and message-passing parallel computers are discussed. An automatic iterative procedure for electrode configuration optimization is presented. The procedure is based on the minimization of an objective function using the parallel direct search technique. Computational performance results are presented together with simulation results. PMID:8656104

  3. p-MEMPSODE: Parallel and irregular memetic global optimization

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Parsopoulos, K. E.; Papageorgiou, D. G.; Lagaris, I. E.; Vrahatis, M. N.

    2015-12-01

    A parallel memetic global optimization algorithm suitable for shared memory multicore systems is proposed and analyzed. The considered algorithm combines two well-known and widely used population-based stochastic algorithms, namely Particle Swarm Optimization and Differential Evolution, with two efficient and parallelizable local search procedures. The sequential version of the algorithm was first introduced as MEMPSODE (MEMetic Particle Swarm Optimization and Differential Evolution) and published in the CPC program library. We exploit the inherent and highly irregular parallelism of the memetic global optimization algorithm by means of a dynamic and multilevel approach based on the OpenMP tasking model. In our case, tasks correspond to local optimization procedures or simple function evaluations. Parallelization occurs at each iteration step of the memetic algorithm without affecting its searching efficiency. The proposed implementation, for the same random seed, reaches the same solution irrespectively of being executed sequentially or in parallel. Extensive experimental evaluation has been performed in order to illustrate the speedup achieved on a shared-memory multicore server.

  4. Parallel heat transport in integrable and chaotic magnetic fields

    SciTech Connect

    Del-Castillo-Negrete, Diego B; Chacon, Luis

    2012-01-01

    The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly chal- lenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), , and the perpendicular, , conductivities ( / may exceed 1010 in fusion plasmas); (ii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates; and (iii) Nonlocal parallel transport in the limit of small collisionality. Motivated by these issues, we present a Lagrangian Green s function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geom- etry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island chain), weakly chaotic (devil s staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local closures, is non-diffusive, thus casting doubts on the appropriateness of the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.

  5. Parallel Midbrain Microcircuits Perform Independent Temporal Transformations

    PubMed Central

    Huguenard, John; Knudsen, Eric

    2014-01-01

    The capacity to select the most important information and suppress distracting information is crucial for survival. The midbrain contains a network critical for the selection of the strongest stimulus for gaze and attention. In avians, the optic tectum (OT; called the superior colliculus in mammals) and the GABAergic nucleus isthmi pars magnocellularis (Imc) cooperate in the selection process. In the chicken, OT layer 10, located in intermediate layers, responds to afferent input with gamma periodicity (25–75 Hz), measured at the level of individual neurons and the local field potential. In contrast, Imc neurons, which receive excitatory input from layer 10 neurons, respond with tonic, unusually high discharge rates (>150 spikes/s). In this study, we reveal the source of this high-rate inhibitory activity: layer 10 neurons that project to the Imc possess specialized biophysical properties that enable them to transform afferent drive into high firing rates (∼130 spikes/s), whereas neighboring layer 10 neurons, which project elsewhere, transform afferent drive into lower-frequency, periodic discharge patterns. Thus, the intermediate layers of the OT contain parallel, intercalated microcircuits that generate different temporal patterns of activity linked to the functions of their respective downstream targets. PMID:24920618

  6. Parallel midbrain microcircuits perform independent temporal transformations.

    PubMed

    Goddard, C Alex; Huguenard, John; Knudsen, Eric

    2014-06-11

    The capacity to select the most important information and suppress distracting information is crucial for survival. The midbrain contains a network critical for the selection of the strongest stimulus for gaze and attention. In avians, the optic tectum (OT; called the superior colliculus in mammals) and the GABAergic nucleus isthmi pars magnocellularis (Imc) cooperate in the selection process. In the chicken, OT layer 10, located in intermediate layers, responds to afferent input with gamma periodicity (25-75 Hz), measured at the level of individual neurons and the local field potential. In contrast, Imc neurons, which receive excitatory input from layer 10 neurons, respond with tonic, unusually high discharge rates (>150 spikes/s). In this study, we reveal the source of this high-rate inhibitory activity: layer 10 neurons that project to the Imc possess specialized biophysical properties that enable them to transform afferent drive into high firing rates (~130 spikes/s), whereas neighboring layer 10 neurons, which project elsewhere, transform afferent drive into lower-frequency, periodic discharge patterns. Thus, the intermediate layers of the OT contain parallel, intercalated microcircuits that generate different temporal patterns of activity linked to the functions of their respective downstream targets. PMID:24920618

  7. Heavy ion acceleration at parallel shocks

    NASA Astrophysics Data System (ADS)

    Galinsky, V. L.; Shevchenko, V. I.

    2010-11-01

    A study of alpha particle acceleration at parallel shock due to an interaction with Alfvén waves self-consistently excited in both upstream and downstream regions was conducted using a scale-separation model (Galinsky and Shevchenko, 2000, 2007). The model uses conservation laws and resonance conditions to find where waves will be generated or damped and hence where particles will be pitch-angle scattered. It considers the total distribution function (for the bulk plasma and high energy tail), so no standard assumptions (e.g. seed populations, or some ad-hoc escape rate of accelerated particles) are required. The heavy ion scattering on hydromagnetic turbulence generated by both protons and ions themselves is considered. The contribution of alpha particles to turbulence generation is important because of their relatively large mass-loading parameter Pα=nαmα/npmp (mp, np and mα, nα are proton and alpha particle mass and density) that defines efficiency of wave excitation. The energy spectra of alpha particles are found and compared with those obtained in test particle approximation.

  8. Highly Parallel, High-Precision Numerical Integration

    SciTech Connect

    Bailey, David H.; Borwein, Jonathan M.

    2005-04-22

    This paper describes a scheme for rapidly computing numerical values of definite integrals to very high accuracy, ranging from ordinary machine precision to hundreds or thousands of digits, even for functions with singularities or infinite derivatives at endpoints. Such a scheme is of interest not only in computational physics and computational chemistry, but also in experimental mathematics, where high-precision numerical values of definite integrals can be used to numerically discover new identities. This paper discusses techniques for a parallel implementation of this scheme, then presents performance results for 1-D and 2-D test suites. Results are also given for a certain problem from mathematical physics, which features a difficult singularity, confirming a conjecture to 20,000 digit accuracy. The performance rate for this latter calculation on 1024 CPUs is 690 Gflop/s. We believe that this and one other 20,000-digit integral evaluation that we report are the highest-precision non-trivial numerical integrations performed to date.

  9. Parallel Processing in Face Perception

    ERIC Educational Resources Information Center

    Martens, Ulla; Leuthold, Hartmut; Schweinberger, Stefan R.

    2010-01-01

    The authors examined face perception models with regard to the functional and temporal organization of facial identity and expression analysis. Participants performed a manual 2-choice go/no-go task to classify faces, where response hand depended on facial familiarity (famous vs. unfamiliar) and response execution depended on facial expression…

  10. Parallelism extraction and program restructuring for parallel simulation of digital systems

    SciTech Connect

    Vellandi, B.L.

    1990-01-01

    Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.

  11. Improved packing of protein side chains with parallel ant colonies

    PubMed Central

    2014-01-01

    Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy

  12. Parallel distance matrix computation for Matlab data mining

    NASA Astrophysics Data System (ADS)

    Skurowski, Przemysław; Staniszewski, Michał

    2016-06-01

    The paper presents utility functions for computing of a distance matrix, which plays a crucial role in data mining. The goal in the design was to enable operating on relatively large datasets by overcoming basic shortcoming - computing time - with an interface easy to use. The presented solution is a set of functions, which were created with emphasis on practical applicability in real life. The proposed solution is presented along the theoretical background for the performance scaling. Furthermore, different approaches of the parallel computing are analyzed, including shared memory, which is uncommon in Matlab environment.

  13. Parallel computation of manipulator inverse dynamics

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1991-01-01

    In this article, parallel computation of manipulator inverse dynamics is investigated. A hierarchical graph-based mapping approach is devised to analyze the inherent parallelism in the Newton-Euler formulation at several computational levels, and to derive the features of an abstract architecture for exploitation of parallelism. At each level, a parallel algorithm represents the application of a parallel model of computation that transforms the computation into a graph whose structure defines the features of an abstract architecture, i.e., number of processors, communication structure, etc. Data-flow analysis is employed to derive the time lower bound in the computation as well as the sequencing of the abstract architecture. The features of the target architecture are defined by optimization of the abstract architecture to exploit maximum parallelism while minimizing architectural complexity. An architecture is designed and implemented that is capable of efficient exploitation of parallelism at several computational levels. The computation time of the Newton-Euler formulation for a 6-degree-of-freedom (dof) general manipulator is measured as 187 microsec. The increase in computation time for each additional dof is 23 microsec, which leads to a computation time of less than 500 microsec, even for a 12-dof redundant arm.

  14. On the Scalability of Parallel UCT

    NASA Astrophysics Data System (ADS)

    Segal, Richard B.

    The parallelization of MCTS across multiple-machines has proven surprisingly difficult. The limitations of existing algorithms were evident in the 2009 Computer Olympiad where Zen using a single four-core machine defeated both Fuego with ten eight-core machines, and Mogo with twenty thirty-two core machines. This paper investigates the limits of parallel MCTS in order to understand why distributed parallelism has proven so difficult and to pave the way towards future distributed algorithms with better scaling. We first analyze the single-threaded scaling of Fuego and find that there is an upper bound on the play-quality improvements which can come from additional search. We then analyze the scaling of an idealized N-core shared memory machine to determine the maximum amount of parallelism supported by MCTS. We show that parallel speedup depends critically on how much time is given to each player. We use this relationship to predict parallel scaling for time scales beyond what can be empirically evaluated due to the immense computation required. Our results show that MCTS can scale nearly perfectly to at least 64 threads when combined with virtual loss, but without virtual loss scaling is limited to just eight threads. We also find that for competition time controls scaling to thousands of threads is impossible not necessarily due to MCTS not scaling, but because high levels of parallelism can start to bump up against the upper performance bound of Fuego itself.

  15. Performance of the Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.

  16. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  17. Run-time recognition of task parallelism within the P++ parallel array class library

    SciTech Connect

    Parsons, R.; Quinlan, D.

    1993-11-01

    This paper explores the use of a run-time system to recognize task parallelism with a C++ array class library. Run-time systems currently support data parallelism in P++, FORTRAN 90 D, and High Performance FORTRAN. But data parallelism in insufficient for many applications, including adaptive mesh refinement. Without access to both data and task parallelism such applications exhibit several orders of magnitude more message passing and poor performance. In this work, a C++ array class library is used to implement deferred evaluation and run-time dependence for task parallelism recognition, tp obtain task parallelism through a data flow interpretation of data parallel array statements. Performance results show that that analysis and optimizations are both efficient and practical, allowing us to consider more substantial optimizations.

  18. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  19. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  20. Parallel Processing of Broad-Band PPM Signals

    NASA Technical Reports Server (NTRS)

    Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement

    2010-01-01

    A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).