Science.gov

Sample records for not4p functions parallel

  1. Functional & para-functional parallel processing

    SciTech Connect

    Not Available

    1994-11-01

    For years (about 20, in fact) dataflow researchers have argued for the use of dataflow (a subset of functional) languages for parallel computing, resting their proof on the ability to construct large-scale dataflow machines to realize the inherent parallelism in Functional programs. Unfortunately, such machines have never materialized as commercial products - instead, the market shows a vast variety of parallel multiprocessors that require special skills to program. It may be the case that these machines reflect a wrong direction in computer architecture design, and it may be the case that dataflow machines are the right way to go, but the proof is in the pudding, and thus far there does not exist even a prototype dataflow machine that can prove the {open_quote}dataflow thesis.{close_quote} Under the circumstances it would seem rather foolhardy simply to ignore the commercial parallel machines that are available now, regardless of one`s favorite programming methodology or concurrency model. It has been the authors` thesis that one can in fact use such machines effectively, while maintaining the concomitant thesis that functional programming is good for parallel computation. During the last two years the author has made considerable progress to support this two-fold thesis, and is now prepared to extend this work in several ways. The authors` particular interest, and presumably the primary interest to DOE, is to concentrate the work in the area of scientific computing, including functional language features, program development tools, and systems support tailored for scientific computing applications. The authors` desire to do this reflects confidence that this approach really will work for scientific computing - the author has spent two years proving the viability of the ideas, and now it`s time to put them into action.

  2. Functional MRI Using Regularized Parallel Imaging Acquisition

    PubMed Central

    Lin, Fa-Hsuan; Huang, Teng-Yi; Chen, Nan-Kuei; Wang, Fu-Nien; Stufflebeam, Steven M.; Belliveau, John W.; Wald, Lawrence L.; Kwong, Kenneth K.

    2013-01-01

    Parallel MRI techniques reconstruct full-FOV images from undersampled k-space data by using the uncorrelated information from RF array coil elements. One disadvantage of parallel MRI is that the image signal-to-noise ratio (SNR) is degraded because of the reduced data samples and the spatially correlated nature of multiple RF receivers. Regularization has been proposed to mitigate the SNR loss originating due to the latter reason. Since it is necessary to utilize static prior to regularization, the dynamic contrast-to-noise ratio (CNR) in parallel MRI will be affected. In this paper we investigate the CNR of regularized sensitivity encoding (SENSE) acquisitions. We propose to implement regularized parallel MRI acquisitions in functional MRI (fMRI) experiments by incorporating the prior from combined segmented echo-planar imaging (EPI) acquisition into SENSE reconstructions. We investigated the impact of regularization on the CNR by performing parametric simulations at various BOLD contrasts, acceleration rates, and sizes of the active brain areas. As quantified by receiver operating characteristic (ROC) analysis, the simulations suggest that the detection power of SENSE fMRI can be improved by regularized reconstructions, compared to unregularized reconstructions. Human motor and visual fMRI data acquired at different field strengths and array coils also demonstrate that regularized SENSE improves the detection of functionally active brain regions. PMID:16032694

  3. Highly parallel oligonucleotide purification and functionalization using reversible chemistry

    PubMed Central

    York, Kerri T.; Smith, Ryan C.; Yang, Rob; Melnyk, Peter C.; Wiley, Melissa M.; Turk, Casey M.; Ronaghi, Mostafa; Gunderson, Kevin L.; Steemers, Frank J.

    2012-01-01

    We have developed a cost-effective, highly parallel method for purification and functionalization of 5′-labeled oligonucleotides. The approach is based on 5′-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5′-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90–99% purity. Approaches for using this method in other applications are also discussed. PMID:22039155

  4. Highly parallel oligonucleotide purification and functionalization using reversible chemistry.

    PubMed

    York, Kerri T; Smith, Ryan C; Yang, Rob; Melnyk, Peter C; Wiley, Melissa M; Turk, Casey M; Ronaghi, Mostafa; Gunderson, Kevin L; Steemers, Frank J

    2012-01-01

    We have developed a cost-effective, highly parallel method for purification and functionalization of 5'-labeled oligonucleotides. The approach is based on 5'-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5'-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90-99% purity. Approaches for using this method in other applications are also discussed. PMID:22039155

  5. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  6. Administering truncated receive functions in a parallel messaging interface

    SciTech Connect

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.

  7. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    SciTech Connect

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  8. Parallel functional programming in Sisal: Fictions, facts, and future

    SciTech Connect

    McGraw, J.R.

    1993-07-01

    This paper provides a status report on the progress of research and development on the functional language Sisal. This project focuses on providing a highly effective method of writing large scientific applications that can efficiently execute on a spectrum of different multiprocessors. The paper includes sections on the language definition, compilation strategies, and programming techniques intended for readers with little or no background with Sisal. The section on performance presents our most recent results on execution speed for shared-memory multiprocessors, our findings using Sisal to develop codes, and our experiences migrating the same source code to different machines. For large programs, the execution performance of Sisal (with minimal supporting advice from the programmer) usually exceeds that of the best available automatic, vector/parallel Fortran compilers. Our evidence also indicates that Sisal programs tend to be shorter in length, faster to write, and dearer to understand than equivalent algorithms in Fortran. The paper concludes with a substantial discussion of common criticisms of the language and our plans for addressing them. Most notably, efficient implementations for distributed memory machines are lacking; an issue we plan to remedy.

  9. Functional networks in parallel with cortical development associate with executive functions in children.

    PubMed

    Zhong, Jidan; Rifkin-Graboi, Anne; Ta, Anh Tuan; Yap, Kar Lai; Chuang, Kai-Hsiang; Meaney, Michael J; Qiu, Anqi

    2014-07-01

    Children begin performing similarly to adults on tasks requiring executive functions in late childhood, a transition that is probably due to neuroanatomical fine-tuning processes, including myelination and synaptic pruning. In parallel to such structural changes in neuroanatomical organization, development of functional organization may also be associated with cognitive behaviors in children. We examined 6- to 10-year-old children's cortical thickness, functional organization, and cognitive performance. We used structural magnetic resonance imaging (MRI) to identify areas with cortical thinning, resting-state fMRI to identify functional organization in parallel to cortical development, and working memory/response inhibition tasks to assess executive functioning. We found that neuroanatomical changes in the form of cortical thinning spread over bilateral frontal, parietal, and occipital regions. These regions were engaged in 3 functional networks: sensorimotor and auditory, executive control, and default mode network. Furthermore, we found that working memory and response inhibition only associated with regional functional connectivity, but not topological organization (i.e., local and global efficiency of information transfer) of these functional networks. Interestingly, functional connections associated with "bottom-up" as opposed to "top-down" processing were more clearly related to children's performance on working memory and response inhibition, implying an important role for brain systems involved in late childhood. PMID:23448875

  10. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  11. Efficient time-dependent density functional theory approximations for hybrid density functionals: analytical gradients and parallelization.

    PubMed

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-01

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ~26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ~27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ~24 on 30 processors. The

  12. Efficient time-dependent density functional theory approximations for hybrid density functionals: Analytical gradients and parallelization

    NASA Astrophysics Data System (ADS)

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-01

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ˜26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ˜27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ˜24 on 30 processors. The

  13. Method, systems, and computer program products for implementing function-parallel network firewall

    DOEpatents

    Fulp, Errin W.; Farley, Ryan J.

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  14. Parallel sites implicate functional convergence of the hearing gene prestin among echolocating mammals.

    PubMed

    Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng

    2014-09-01

    Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation. PMID:24951728

  15. Serial and Parallel Attentive Visual Searches: Evidence from Cumulative Distribution Functions of Response Times

    ERIC Educational Resources Information Center

    Sung, Kyongje

    2008-01-01

    Participants searched a visual display for a target among distractors. Each of 3 experiments tested a condition proposed to require attention and for which certain models propose a serial search. Serial versus parallel processing was tested by examining effects on response time means and cumulative distribution functions. In 2 conditions, the…

  16. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    Charon is a software toolkit that enables engineers to develop high-performing message-passing programs in a convenient and piecemeal fashion. Emphasis is on rapid program development and prototyping. In this report a detailed description of the functional design of the toolkit is presented. It is illustrated by the stepwise parallelization of two representative code examples.

  17. Analysis and selection of optimal function implementations in massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  18. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  19. Hyperspectral band selection based on parallel particle swarm optimization and impurity function band prioritization schemes

    NASA Astrophysics Data System (ADS)

    Chang, Yang-Lang; Liu, Jin-Nan; Chen, Yen-Lin; Chang, Wen-Yen; Hsieh, Tung-Ju; Huang, Bormin

    2014-01-01

    In recent years, satellite imaging technologies have resulted in an increased number of bands acquired by hyperspectral sensors, greatly advancing the field of remote sensing. Accordingly, owing to the increasing number of bands, band selection in hyperspectral imagery for dimension reduction is important. This paper presents a framework for band selection in hyperspectral imagery that uses two techniques, referred to as particle swarm optimization (PSO) band selection and the impurity function band prioritization (IFBP) method. With the PSO band selection algorithm, highly correlated bands of hyperspectral imagery can first be grouped into modules to coarsely reduce high-dimensional datasets. Then, these highly correlated band modules are analyzed with the IFBP method to finely select the most important feature bands from the hyperspectral imagery dataset. However, PSO band selection is a time-consuming procedure when the number of hyperspectral bands is very large. Hence, this paper proposes a parallel computing version of PSO, namely parallel PSO (PPSO), using a modern graphics processing unit (GPU) architecture with NVIDIA's compute unified device architecture technology to improve the computational speed of PSO processes. The natural parallelism of the proposed PPSO lies in the fact that each particle can be regarded as an independent agent. Parallel computation benefits the algorithm by providing each agent with a parallel processor. The intrinsic parallel characteristics embedded in PPSO are, therefore, suitable for parallel computation. The effectiveness of the proposed PPSO is evaluated through the use of airborne visible/infrared imaging spectrometer hyperspectral images. The performance of PPSO is validated using the supervised K-nearest neighbor classifier. The experimental results demonstrate that the proposed PPSO/IFBP band selection method can not only improve computational speed, but also offer a satisfactory classification performance.

  20. Storing files in a parallel computing system based on user-specified parser function

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  1. Structure-Function Relationships of Postnatal Tendon Development: A Parallel to Healing

    PubMed Central

    Connizzo, Brianne K.; Yannascoli, Sarah M.; Soslowsky, Louis J.

    2013-01-01

    This review highlights recent research on structure-function relationships in tendon and comments on the parallels between development and healing. The processes of tendon development and collagen fibrillogenesis are reviewed, but due to the abundance of information in this field, this work focuses primarily on characterizing the mechanical behavior of mature and developing tendon, and how the latter parallels healing tendon. The role that extracellular matrix components, mainly collagen, proteoglycans, and collagen cross-links, play in determining the mechanical behavior of tendon will be examined in this review. Specifically, collagen fiber re-alignment and collagen fibril uncrimping relate mechanical behavior to structural alterations during development and during healing. Finally, attention is paid to a number of recent efforts to augment injured tendon and how future efforts could focus on recreating the important structure-function relationships reviewed here. PMID:23357642

  2. Optimization of a parallel permutation testing function for the SPRINT R package

    PubMed Central

    Petrou, Savvas; Sloan, Terence M; Mewissen, Muriel; Forster, Thorsten; Piotrowski, Michal; Dobrzelecki, Bartosz; Ghazal, Peter; Trew, Arthur; Hill, Jon

    2011-01-01

    The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright © 2011 John Wiley & Sons, Ltd. PMID:23335858

  3. Optimization of a parallel permutation testing function for the SPRINT R package.

    PubMed

    Petrou, Savvas; Sloan, Terence M; Mewissen, Muriel; Forster, Thorsten; Piotrowski, Michal; Dobrzelecki, Bartosz; Ghazal, Peter; Trew, Arthur; Hill, Jon

    2011-12-10

    The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright © 2011 John Wiley & Sons, Ltd. PMID:23335858

  4. Micro/Nanoscale Parallel Patterning of Functional Biomolecules, Organic Fluorophores and Colloidal Nanocrystals

    NASA Astrophysics Data System (ADS)

    Sabella, S.; Brunetti, V.; Vecchio, G.; Torre, A. Della; Rinaldi, R.; Cingolani, R.; Pompa, P. P.

    2009-10-01

    We describe the design and optimization of a reliable strategy that combines self-assembly and lithographic techniques, leading to very precise micro-/nanopositioning of biomolecules for the realization of micro- and nanoarrays of functional DNA and antibodies. Moreover, based on the covalent immobilization of stable and versatile SAMs of programmable chemical reactivity, this approach constitutes a general platform for the parallel site-specific deposition of a wide range of molecules such as organic fluorophores and water-soluble colloidal nanocrystals.

  5. Parallel functional category deficits in clauses and nominal phrases: The case of English agrammatism

    PubMed Central

    Wang, Honglei; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Individuals with agrammatic aphasia exhibit restricted patterns of impairment of functional morphemes, however, syntactic characterization of the impairment is controversial. Previous studies have focused on functional morphology in clauses only. This study extends the empirical domain by testing functional morphemes in English nominal phrases in aphasia and comparing patients’ impairment to their impairment of functional morphemes in English clauses. In the linguistics literature, it is assumed that clauses and nominal phrases are structurally parallel but exhibit inflectional differences. The results of the present study indicated that aphasic speakers evinced similar impairment patterns in clauses and nominal phrases. These findings are consistent with the Distributed Morphology Hypothesis (DMH), suggesting that the source of functional morphology deficits among agrammatics relates to difficulty implementing rules that convert inflectional features into morphemes. Our findings, however, are inconsistent with the Tree Pruning Hypothesis (TPH), which suggests that patients have difficulty building complex hierarchical structures. PMID:26379370

  6. A cost-effective methodology for the design of massively-parallel VLSI functional units

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  7. Parallel fixed point implementation of a radial basis function network in an FPGA.

    PubMed

    de Souza, Alisson C D; Fernandes, Marcelo A C

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. PMID:25268918

  8. Parallel Fixed Point Implementation of a Radial Basis Function Network in an FPGA

    PubMed Central

    de Souza, Alisson C. D.; Fernandes, Marcelo A. C.

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. PMID:25268918

  9. Electromagnetic semitransparent δ-function plate: Casimir interaction energy between parallel infinitesimally thin plates

    NASA Astrophysics Data System (ADS)

    Parashar, Prachi; Milton, Kimball A.; Shajesh, K. V.; Schaden, M.

    2012-10-01

    We derive boundary conditions for electromagnetic fields on a δ-function plate. The optical properties of such a plate are shown to necessarily be anisotropic in that they only depend on the transverse properties of the plate. We unambiguously obtain the boundary conditions for a perfectly conducting δ-function plate in the limit of infinite dielectric response. We show that a material does not “optically vanish” in the thin-plate limit. The thin-plate limit of a plasma slab of thickness d with plasma frequency ωp2=ζp/d reduces to a δ-function plate for frequencies (ω=iζ) satisfying ζd≪ζpd≪1. We show that the Casimir interaction energy between two parallel perfectly conducting δ-function plates is the same as that for parallel perfectly conducting slabs. Similarly, we show that the interaction energy between an atom and a perfect electrically conducting δ-function plate is the usual Casimir-Polder energy, which is verified by considering the thin-plate limit of dielectric slabs. The “thick” and “thin” boundary conditions considered by Bordag are found to be identical in the sense that they lead to the same electromagnetic fields.

  10. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  11. WSINV3DMT: Vertical magnetic field transfer function inversion and parallel implementation

    NASA Astrophysics Data System (ADS)

    Siripunvaraporn, Weerachai; Egbert, Gary

    2009-04-01

    We describe two extensions to the three-dimensional magnetotelluric inversion program WSINV3DMT (Siripunvaraporn, W., Egbert, G., Lenbury, Y., Uyeshima, M., 2005, Three-dimensional magnetotelluric inversion: data-space method. Phys. Earth Planet. Interiors 150, 3-14), including modifications to allow inversion of the vertical magnetic transfer functions (VTFs), and parallelization of the code. The parallel implementation, which is most appropriate for small clusters, uses MPI to distribute forward solutions for different frequencies, as well as some linear algebraic computations, over multiple processors. In addition to reducing run times, the parallelization reduces memory requirements by distributing storage of the sensitivity matrix. Both new features are tested on synthetic and real datasets, revealing nearly linear speedup for a small number of processors (up to 8). Experiments on synthetic examples show that the horizontal position and lateral conductivity contrasts of anomalies can be recovered by inverting VTFs alone. However, vertical positions and absolute amplitudes are not well constrained unless an accurate host resistivity is imposed a priori. On very simple synthetic models including VTFs in a joint inversion had little impact on the inverse solution computed with impedances alone. However, in experiments with real data, inverse solutions obtained from joint inversion of VTF and impedances, and from impedances alone, differed in important ways, suggesting that for structures with more realistic levels of complexity the VTFs will in general provide useful additional constraints.

  12. Finding zeros of nonlinear functions using the hybrid parallel cell mapping method

    NASA Astrophysics Data System (ADS)

    Xiong, Fu-Rui; Schütze, Oliver; Ding, Qian; Sun, Jian-Qiao

    2016-05-01

    Analysis of nonlinear dynamical systems including finding equilibrium states and stability boundaries often leads to a problem of finding zeros of vector functions. However, finding all the zeros of a set of vector functions in the domain of interest is quite a challenging task. This paper proposes a zero finding algorithm that combines the cell mapping methods and the subdivision techniques. Both the simple cell mapping (SCM) and generalized cell mapping (GCM) methods are used to identify a covering set of zeros. The subdivision technique is applied to enhance the solution resolution. The parallel implementation of the proposed method is discussed extensively. Several examples are presented to demonstrate the application and effectiveness of the proposed method. We then extend the study of finding zeros to the problem of finding stability boundaries of potential fields. Examples of two and three dimensional potential fields are studied. In addition to the effectiveness in finding the stability boundaries, the proposed method can handle several millions of cells in just a few seconds with the help of parallel computing in graphics processing units (GPUs).

  13. Line-field parallel swept source MHz OCT for structural and functional retinal imaging

    PubMed Central

    Fechtig, Daniel J.; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M.; Drexler, Wolfgang; Leitgeb, Rainer A.

    2015-01-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation. PMID:25798298

  14. Line-field parallel swept source MHz OCT for structural and functional retinal imaging.

    PubMed

    Fechtig, Daniel J; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M; Drexler, Wolfgang; Leitgeb, Rainer A

    2015-03-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation. PMID:25798298

  15. Free minimization of the fundamental measure theory functional: Freezing of parallel hard squares and cubes.

    PubMed

    Belli, S; Dijkstra, M; van Roij, R

    2012-09-28

    Due to remarkable advances in colloid synthesis techniques, systems of squares and cubes, once an academic abstraction for theorists and simulators, are nowadays an experimental reality. By means of a free minimization of the free-energy functional, we apply fundamental measure theory to analyze the phase behavior of parallel hard squares and hard cubes. We compare our results with those obtained by the traditional approach based on the Gaussian parameterization, finding small deviations and good overall agreement between the two methods. For hard squares, our predictions feature at intermediate packing fraction a smectic phase, which is however expected to be unstable due to thermal fluctuations. Due to this inconsistency, we cannot determine unambiguously the prediction of the theory for the expected fluid-to-crystal transition of parallel hard squares, but we deduce two alternative scenarios: (i) a second-order transition with a coexisting vacancy-rich crystal or (ii) a higher-density first-order transition with a coexisting crystal characterized by a lower vacancy concentration. In accordance with previous studies, a second-order transition with a high vacancy concentration is predicted for hard cubes. PMID:23020342

  16. Adaptation to warmer climates by parallel functional evolution of CBF genes in Arabidopsis thaliana.

    PubMed

    Monroe, J Grey; McGovern, Cullen; Lasky, Jesse R; Grogan, Kelsi; Beck, James; McKay, John K

    2016-08-01

    The evolutionary processes and genetics underlying local adaptation at a specieswide level are largely unknown. Recent work has indicated that a frameshift mutation in a member of a family of transcription factors, C-repeat binding factors or CBFs, underlies local adaptation and freezing tolerance divergence between two European populations of Arabidopsis thaliana. To ask whether the specieswide evolution of CBF genes in Arabidopsis is consistent with local adaptation, we surveyed CBF variation from 477 wild accessions collected across the species' range. We found that CBF sequence variation is strongly associated with winter temperature variables. Looking specifically at the minimum temperature experienced during the coldest month, we found that Arabidopsis from warmer climates exhibit a significant excess of nonsynonymous polymorphisms in CBF genes and revealed a CBF haplotype network whose structure points to multiple independent transitions to warmer climates. We also identified a number of newly described mutations of significant functional effect in CBF genes, similar to the frameshift mutation previously indicated to be locally adaptive in Italy, and find that they are significantly associated with warm winters. Lastly, we uncover relationships between climate and the position of significant functional effect mutations between and within CBF paralogs, suggesting variation in adaptive function of different mutations. Cumulatively, these findings support the hypothesis that disruption of CBF gene function is adaptive in warmer climates, and illustrate how parallel evolution in a transcription factor can underlie adaptation to climate. PMID:27247130

  17. Implementation of the Turn Function Method in a three-dimensional, parallelized hydrodynamics code

    NASA Astrophysics Data System (ADS)

    Orourke, P. J.; Fairfield, M. S.

    1992-08-01

    The implementation of the Turn Function Method in KIVA-F90, a version of the KIVA computer program written in the FORTRAN 90 programming language that is used on some massively parallel computers is described. The Turn Function Method solves both linear momentum and vorticity equations in numerical calculations of compressible fluid flow. Solving a vorticity equation allows vorticity to be both conserved and transported more accurately than in traditional methods for computing compressible flow. This first implementation of the Turn Function Method in a three-dimensional hydrodynamics code involved some modification of the original method and some numerical difference approximations. In particular, a penalty method is used to keep the divergence of the computed vorticity field close to zero. Difference operators are also defined in such a way that the finite difference analog of del(del x u) = 0 is exactly satisfied. Three example problems show the increased computational cost and the accuracy to be gained by using the Turn Function Method in calculations of flows with rotational motion. Use of the Method can increase by 60 percent the computational times of the Euler equation solver in KIVA-F90, but it is concluded that this increased cost is justified by the increased accuracy.

  18. Leukocytosis and natural killer cell function parallel neurobehavioral fatigue induced by 64 hours of sleep deprivation.

    PubMed

    Dinges, D F; Douglas, S D; Zaugg, L; Campbell, D E; McMann, J M; Whitehouse, W G; Orne, E C; Kapoor, S C; Icaza, E; Orne, M T

    1994-05-01

    The hypothesis that sleep deprivation depresses immune function was tested in 20 adults, selected on the basis of their normal blood chemistry, monitored in a laboratory for 7 d, and kept awake for 64 h. At 2200 h each day measurements were taken of total leukocytes (WBC), monocytes, granulocytes, lymphocytes, eosinophils, erythrocytes (RBC), B and T lymphocyte subsets, activated T cells, and natural killer (NK) subpopulations (CD56/CD8 dual-positive cells, CD16-positive cells, CD57-positive cells). Functional tests included NK cytotoxicity, lymphocyte stimulation with mitogens, and DNA analysis of cell cycle. Sleep loss was associated with leukocytosis and increased NK cell activity. At the maximum sleep deprivation, increases were observed in counts of WBC, granulocytes, monocytes, NK activity, and the proportion of lymphocytes in the S phase of the cell cycle. Changes in monocyte counts correlated with changes in other immune parameters. Counts of CD4, CD16, CD56, and CD57 lymphocytes declined after one night without sleep, whereas CD56 and CD57 counts increased after two nights. No changes were observed in other lymphocyte counts, in proliferative responses to mitogens, or in plasma levels of cortisol or adrenocorticotropin hormone. The physiologic leukocytosis and NK activity increases during deprivation were eliminated by recovery sleep in a manner parallel to neurobehavioral function, suggesting that the immune alterations may be associated with biological pressure for sleep. PMID:7910171

  19. Depth estimation via parallel coevolution of disparity functions for area-based stereo

    NASA Astrophysics Data System (ADS)

    Liatsis, Panos; Goulermas, John Y.

    2001-02-01

    12 A novel system for depth estimation is proposed with the use of Symbiotic Genetic Algorithms for the continuous problem of disparity surface approximation. The approach is based on the decomposition of the entire surface to very small non- overlapping patches described by low order bivariate polynomials and the use of symbiotic optimization to enforce smoothness at the boundaries of these patches, so that the entire surface can be approximated in a smooth piecewise fashion by functionals of local support. Such optimization is amenable to a massive parallel implementation, since each patch is optimized by a different execution unit and each unit communicates through its cost function only with its four-connected neighbors. The method makes use of various existing crossover and mutation schemes for real-valued chromosome representations and a new problem-specific mechanism for generating and hybridizing the initial populations. The proposed multi-objective cost function enforces photometric similarity and smoothness between the patch boundaries at a local scale, which in the long term give rise to a globally smooth disparity surface.

  20. Parallel Loss-of-Function at the RPM1 Bacterial Resistance Locus in Arabidopsis thaliana

    PubMed Central

    Rose, Laura; Atwell, Susanna; Grant, Murray; Holub, Eric B.

    2012-01-01

    Dimorphism at the Resistance to Pseudomonas syringae pv. maculicola 1 (RPM1) locus is well documented in natural populations of Arabidopsis thaliana and has been portrayed as a long-term balanced polymorphism. The haplotype from resistant plants contains the RPM1 gene, which enables these plants to recognize at least two structurally unrelated bacterial effector proteins (AvrB and AvrRpm1) from bacterial crop pathogens. A complete deletion of the RPM1 coding sequence has been interpreted as a single event resulting in susceptibility in these individuals. Consequently, the ability to revert to resistance or for alternative R-gene specificities to evolve at this locus has also been lost in these individuals. Our survey of variation at the RPM1 locus in a large species-wide sample of A. thaliana has revealed four new loss-of-function alleles that contain most of the intervening sequence of the RPM1 open reading frame. Multiple loss-of-function alleles may have originated due to the reported intrinsic cost to plants expressing the RPM1 protein. The frequency and geographic distribution of rpm1 alleles observed in our survey indicate the parallel origin and maintenance of these loss-of-function mutations and reveal a more complex history of natural selection at this locus than previously thought. PMID:23272006

  1. Leukocytosis and natural killer cell function parallel neurobehavioral fatigue induced by 64 hours of sleep deprivation.

    PubMed Central

    Dinges, D F; Douglas, S D; Zaugg, L; Campbell, D E; McMann, J M; Whitehouse, W G; Orne, E C; Kapoor, S C; Icaza, E; Orne, M T

    1994-01-01

    The hypothesis that sleep deprivation depresses immune function was tested in 20 adults, selected on the basis of their normal blood chemistry, monitored in a laboratory for 7 d, and kept awake for 64 h. At 2200 h each day measurements were taken of total leukocytes (WBC), monocytes, granulocytes, lymphocytes, eosinophils, erythrocytes (RBC), B and T lymphocyte subsets, activated T cells, and natural killer (NK) subpopulations (CD56/CD8 dual-positive cells, CD16-positive cells, CD57-positive cells). Functional tests included NK cytotoxicity, lymphocyte stimulation with mitogens, and DNA analysis of cell cycle. Sleep loss was associated with leukocytosis and increased NK cell activity. At the maximum sleep deprivation, increases were observed in counts of WBC, granulocytes, monocytes, NK activity, and the proportion of lymphocytes in the S phase of the cell cycle. Changes in monocyte counts correlated with changes in other immune parameters. Counts of CD4, CD16, CD56, and CD57 lymphocytes declined after one night without sleep, whereas CD56 and CD57 counts increased after two nights. No changes were observed in other lymphocyte counts, in proliferative responses to mitogens, or in plasma levels of cortisol or adrenocorticotropin hormone. The physiologic leukocytosis and NK activity increases during deprivation were eliminated by recovery sleep in a manner parallel to neurobehavioral function, suggesting that the immune alterations may be associated with biological pressure for sleep. PMID:7910171

  2. Saccharomyces cerevisiae MPT5 and SSD1 function in parallel pathways to promote cell wall integrity.

    PubMed Central

    Kaeberlein, Matt; Guarente, Leonard

    2002-01-01

    Yeast MPT5 (UTH4) is a limiting component for longevity. We show here that MPT5 also functions to promote cell wall integrity. Loss of Mpt5p results in phenotypes associated with a weakened cell wall, including sorbitol-remedial temperature sensitivity and sensitivities to calcofluor white and sodium dodecyl sulfate. Additionally, we find that mutation of MPT5, in the absence of SSD1-V, is lethal in combination with loss of either Ccr4p or Swi4p. These synthetic lethal interactions are suppressed by the SSD1-V allele. Furthermore, we have provided evidence that the short life span caused by loss of Mpt5p is due to a weakened cell wall. This cell wall defect may be the result of abnormal chitin biosynthesis or accumulation. These analyses have defined three genetic pathways that function in parallel to promote cell integrity: an Mpt5p-containing pathway, an Ssd1p-containing pathway, and a Pkc1p-dependent pathway. This work also provides evidence that post-transcriptional regulation is likely to be important both for maintaining cell integrity and for promoting longevity. PMID:11805047

  3. Instantaneous, parallel mapping of protein electronic function with angle-resolved coherent wave-mixing

    NASA Astrophysics Data System (ADS)

    Mercer, Ian

    2010-03-01

    We present a novel laser method, angle-resolved coherent (ARC) wave-mixing, that separates out coherent electronic couplings from energy transfers in an instantaneous two-dimensional mapping (Ian P. Mercer et.al., Phys. Rev. Lett. 102, 57402, 2009). For this we use an ultra-broadband hollow fibre laser source. The power of the new method is demonstrated with the light harvesting complex II (LH2) of purple bacteria at ambient temperature. We observe signaturs of a coherent quantum electronic beating, a correlation between excitation and emission energies in the protein and a coherent component to the energy transfer between molecular rings. We are interested in exploring avenues for high throughput fingerprinting of molecular structure and function. Massively parallel maps, rich in detail, can be taken from solutions, surface films or solids of between 1 and 1000 microL. Each ARC map is generated instantaneously, with high throughput (currently up to 1kHz frame rate) and is noninvasive.

  4. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    PubMed

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E; Muñoz-Fuentes, Violeta; Green, Andy J; Kopuchian, Cecilia; Tubaro, Pablo L; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M; Wilson, Robert E; Fago, Angela; McCracken, Kevin G; Storz, Jay F

    2015-12-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level. PMID:26637114

  5. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level

    PubMed Central

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E.; Muñoz-Fuentes, Violeta; Green, Andy J.; Kopuchian, Cecilia; Tubaro, Pablo L.; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M.; Wilson, Robert E.; Fago, Angela; McCracken, Kevin G.; Storz, Jay F.

    2015-01-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level. PMID:26637114

  6. High-throughput optogenetic functional magnetic resonance imaging with parallel computations

    PubMed Central

    Fang, Zhongnan; Lee, Jin Hyung

    2013-01-01

    Optogenetic functional magnetic resonance imaging (ofMRI) technology enables cell-type specific, temporally precise neuronal control and accurate, in vivo readout of resulting activity across the whole brain. With the ability to precisely control excitation and inhibition parameters, and to accurately record the resulting activity, there is an increased need for a high-throughput method to bring the ofMRI studies to their full potential. In this paper, an advanced system that can allow real-time fMRI with interactive control and analysis in a fraction of the MRI acquisition repetition time (TR) is proposed. With such high processing speed, sufficient time will be available for integration of future developments that can further enhance ofMRI data quality or better streamline the study. We designed and implemented a highly optimized, massively parallel system using graphics processing unit (GPU)s which achieves reconstruction, motion correction, and analysis of 3D volume data in approximately 12.80 ms. As a result, with a 750 ms TR and 4 interleaf fMRI acquisition, we can now conduct sliding window reconstruction, motion correction, analysis and display in approximately 1.7% of the TR. Therefore, a significant amount of time can now be allocated to integrating advanced but computationally intensive methods that can enable higher image quality and better analysis results all within a TR. Utilizing the proposed high-throughput imaging platform with sliding window reconstruction, we were also able to observe the much-debated initial dips in our ofMRI data. Combined with methods to further improve SNR, the proposed system will enable efficient real-time, interactive, high-throughput ofMRI studies. PMID:23747482

  7. Energy distribution functions of kilovolt ions parallel and perpendicular to the magnetic field of a modified Penning discharge

    NASA Technical Reports Server (NTRS)

    Roth, R. J.

    1973-01-01

    The distribution function of ion energy parallel to the magnetic field of a modified Penning discharge has been measured with a retarding potential energy analyzer. These ions escaped through one of the throats of the magnetic mirror geometry. Simultaneous measurements of the ion energy distribution function perpendicular to the magnetic field have been made with a charge exchange neutral detector. The ion energy distribution functions are approximately Maxwellian, and the parallel and perpendicular kinetic temperatures are equal within experimental error. These results suggest that turbulent processes previously observed in this discharge Maxwellianize the velocity distribution along a radius in velocity space and cause an isotropic energy distribution. When the distributions depart from Maxwellian, they are enhanced above the Maxwellian tail.

  8. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.

  9. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    NASA Technical Reports Server (NTRS)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  10. Parallel algorithms and architectures

    SciTech Connect

    Albrecht, A.; Jung, H.; Mehlhorn, K.

    1987-01-01

    Contents of this book are the following: Preparata: Deterministic simulation of idealized parallel computers on more realistic ones; Convex hull of randomly chosen points from a polytope; Dataflow computing; Parallel in sequence; Towards the architecture of an elementary cortical processor; Parallel algorithms and static analysis of parallel programs; Parallel processing of combinatorial search; Communications; An O(nlogn) cost parallel algorithms for the single function coarsest partition problem; Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region; and RELACS - A recursive layout computing system. Parallel linear conflict-free subtree access.

  11. Memorability of commands learned as keywords or function keys: A parallel to voice recognition interfaces

    SciTech Connect

    Sorn, K.; Schultz, E.E. Jr.

    1987-09-15

    Voice recognition interfaces require users to input keywords to access and control functions. An experiment was conducted to compare user's memory for keywords relative to names of equivalent function keys. Thirty-five subjects attempted to learn word processing functions as keywords or in terms of the names of function keys which allowed access to and control of these functions. Keyword learning produced a significantly higher proportion of correct recalls and fewer intrusions (false recalls) after both an immediate retention test and an unexpected second test two weeks later. Superior keyword memorability is an important potential advantage of voice recognition interfaces.

  12. Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

    NASA Technical Reports Server (NTRS)

    Wheatley, Thomas E.; Michaloski, John L.; Lumia, Ronald

    1989-01-01

    Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored.

  13. Morphological Functions with Parallel Sets for the Pore Space of X-ray CT Images of Soil Columns

    NASA Astrophysics Data System (ADS)

    San José Martínez, F.; Muñoz Ortega, F. J.; Caniego Monreal, F. J.; Peregrina, F.

    2016-03-01

    During the last few decades, new imaging techniques like X-ray computed tomography have made available rich and detailed information of the spatial arrangement of soil constituents, usually referred to as soil structure. Mathematical morphology provides a plethora of mathematical techniques to analyze and parameterize the geometry of soil structure. They provide a guide to design the process from image analysis to the generation of synthetic models of soil structure in order to investigate key features of flow and transport phenomena in soil. In this work, we explore the ability of morphological functions built over Minkowski functionals with parallel sets of the pore space to characterize and quantify pore space geometry of columns of intact soil. These morphological functions seem to discriminate the effects on soil pore space geometry of contrasting management practices in a Mediterranean vineyard, and they provide the first step toward identifying the statistical significance of the observed differences.

  14. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers

    PubMed Central

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Virta, Marko PJuhani; Alm, Eric J

    2016-01-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells. PMID:26394010

  15. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers.

    PubMed

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Juhani Virta, Marko P; Alm, Eric J

    2016-02-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells. PMID:26394010

  16. A coarse-grained model for DNA-functionalized spherical colloids, revisited: effective pair potential from parallel replica simulations.

    PubMed

    Theodorakis, Panagiotis E; Dellago, Christoph; Kahl, Gerhard

    2013-01-14

    We discuss a coarse-grained model recently proposed by Starr and Sciortino [J. Phys.: Condens. Matter 18, L347 (2006)] for spherical particles functionalized with short single DNA strands. The model incorporates two key aspects of DNA hybridization, i.e., the specificity of binding between DNA bases and the strong directionality of hydrogen bonds. Here, we calculate the effective potential between two DNA-functionalized particles of equal size using a parallel replica protocol. We find that the transition from bonded to unbonded configurations takes place at considerably lower temperatures compared to those that were originally predicted using standard simulations in the canonical ensemble. We put particular focus on DNA-decorations of tetrahedral and octahedral symmetry, as they are promising candidates for the self-assembly into a single-component diamond structure. Increasing colloid size hinders hybridization of the DNA strands, in agreement with experimental findings. PMID:23320725

  17. Parallel Changes in Structural and Functional Measures of Optic Nerve Myelination after Optic Neuritis

    PubMed Central

    van der Walt, Anneke; Kolbe, Scott; Mitchell, Peter; Wang, Yejun; Butzkueven, Helmut; Egan, Gary; Yiannikas, Con; Graham, Stuart; Kilpatrick, Trevor; Klistorner, Alexander

    2015-01-01

    Introduction Visual evoked potential (VEP) latency prolongation and optic nerve lesion length after acute optic neuritis (ON) corresponds to the degree of demyelination, while subsequent recovery of latency may represent optic nerve remyelination. We aimed to investigate the relationship between multifocal VEP (mfVEP) latency and optic nerve lesion length after acute ON. Methods Thirty acute ON patients were studied at 1,3,6 and 12 months using mfVEP and at 1 and 12 months with optic nerve MRI. LogMAR and low contrast visual acuity were documented. By one month, the mfVEP amplitude had recovered sufficiently for latency to be measured in 23 (76.7%) patients with seven patients having no recordable mfVEP in more than 66% of segments in at least one test. Only data from these 23 patients was analysed further. Results Both latency and lesion length showed significant recovery during the follow-up period. Lesion length and mfVEP latency were highly correlated at 1 (r = 0.94, p = <0.0001) and 12 months (r = 0.75, p < 0.001). Both measures demonstrated a similar trend of recovery. Speed of latency recovery was faster in the early follow-up period while lesion length shortening remained relatively constant. At 1 month, latency delay was worse by 1.76ms for additional 1mm of lesion length while at 12 months, 1mm of lesion length accounted for 1.94ms of latency delay. Conclusion A strong association between two putative measures of demyelination in early and chronic ON was found. Parallel recovery of both measures could reflect optic nerve remyelination. PMID:26020925

  18. The functional significance of cortical reorganization and the parallel development of CI therapy.

    PubMed

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  19. The functional significance of cortical reorganization and the parallel development of CI therapy

    PubMed Central

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W.

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  20. The functional and anatomical organization of marsupial neocortex: Evidence for parallel evolution across mammals

    PubMed Central

    Karlen, Sarah J.; Krubitzer, Leah

    2007-01-01

    Marsupials are a diverse group of mammals that occupy a large range of habitats and have evolved a wide array of unique adaptations. Although they are as diverse as placental mammals, our understanding of marsupial brain organization is more limited. Like placental mammals, marsupials have striking similarities in neocortical organization, such as a constellation of cortical fields including S1, S2, V1, V2, and A1, that are functionally, architectonically, and connectionally distinct. In this review, we describe the general lifestyle and morphological characteristics of all marsupials and the organization of somatosensory, motor, visual, and auditory cortex. For each sensory system, we compare the functional organization and the corticocortical and thalamocortical connections of the neocortex across species. Differences between placental and marsupial species are discussed and the theories on neocortical evolution that have been derived from studying marsupials, particularly the idea of a sensorimotor amalgam, are evaluated. Overall, marsupials inhabit a variety of niches and assume many different lifestyles. For example, marsupials occupy terrestrial, arboreal, burrowing, and aquatic environments; some animals are highly social while others are solitary; and different species are carnivorous, herbivorous, or omnivorous. For each of these adaptations, marsupials have evolved an array of morphological, behavioral, and cortical specializations that are strikingly similar to those observed in placental mammals occupying similar habitats, which indicate that there are constraints imposed on evolving nervous systems that result in recurrent solutions to similar environmental challenges. PMID:17507143

  1. Predictive biomarker discovery through the parallel integration of clinical trial and functional genomics datasets.

    PubMed

    Swanton, Charles; Larkin, James M; Gerlinger, Marco; Eklund, Aron C; Howell, Michael; Stamp, Gordon; Downward, Julian; Gore, Martin; Futreal, P Andrew; Escudier, Bernard; Andre, Fabrice; Albiges, Laurence; Beuselinck, Benoit; Oudard, Stephane; Hoffmann, Jens; Gyorffy, Balázs; Torrance, Chris J; Boehme, Karen A; Volkmer, Hansjuergen; Toschi, Luisella; Nicke, Barbara; Beck, Marlene; Szallasi, Zoltan

    2010-01-01

    The European Union multi-disciplinary Personalised RNA interference to Enhance the Delivery of Individualised Cytotoxic and Targeted therapeutics (PREDICT) consortium has recently initiated a framework to accelerate the development of predictive biomarkers of individual patient response to anti-cancer agents. The consortium focuses on the identification of reliable predictive biomarkers to approved agents with anti-angiogenic activity for which no reliable predictive biomarkers exist: sunitinib, a multi-targeted tyrosine kinase inhibitor and everolimus, a mammalian target of rapamycin (mTOR) pathway inhibitor. Through the analysis of tumor tissue derived from pre-operative renal cell carcinoma (RCC) clinical trials, the PREDICT consortium will use established and novel methods to integrate comprehensive tumor-derived genomic data with personalized tumor-derived small hairpin RNA and high-throughput small interfering RNA screens to identify and validate functionally important genomic or transcriptomic predictive biomarkers of individual drug response in patients. PREDICT's approach to predictive biomarker discovery differs from conventional associative learning approaches, which can be susceptible to the detection of chance associations that lead to overestimation of true clinical accuracy. These methods will identify molecular pathways important for survival and growth of RCC cells and particular targets suitable for therapeutic development. Importantly, our results may enable individualized treatment of RCC, reducing ineffective therapy in drug-resistant disease, leading to improved quality of life and higher cost efficiency, which in turn should broaden patient access to beneficial therapeutics, thereby enhancing clinical outcome and cancer survival. The consortium will also establish and consolidate a European network providing the technological and clinical platform for large-scale functional genomic biomarker discovery. Here we review our current understanding

  2. Parallel blind deconvolution of astronomical images based on the fractal energy ratio of the image and regularization of the point spread function

    NASA Astrophysics Data System (ADS)

    Jia, Peng; Cai, Dongmei; Wang, Dong

    2014-11-01

    A parallel blind deconvolution algorithm is presented. The algorithm contains the constraints of the point spread function (PSF) derived from the physical process of the imaging. Additionally, in order to obtain an effective restored image, the fractal energy ratio is used as an evaluation criterion to estimate the quality of the image. This algorithm is fine-grained parallelized to increase the calculation speed. Results of numerical experiments and real experiments indicate that this algorithm is effective.

  3. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    PubMed Central

    Gudino, N.; Sonmez, M.; Yao, Z.; Baig, T.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Martens, M.; Balaban, R. S.; Hansen, M. S.; Griswold, M. A.

    2015-01-01

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B1) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  4. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    SciTech Connect

    Gudino, N.; Sonmez, M.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Balaban, R. S.; Hansen, M. S.; Yao, Z.; Baig, T.; Martens, M.; Griswold, M. A.

    2015-01-15

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B{sub 1}) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  5. EUPDF: Eulerian Monte Carlo Probability Density Function Solver for Applications With Parallel Computing, Unstructured Grids, and Sprays

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    1998-01-01

    The success of any solution methodology used in the study of gas-turbine combustor flows depends a great deal on how well it can model the various complex and rate controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as convective and radiative heat transfer and other phenomena. The phenomena to be modeled, which are controlled by these processes, often strongly interact with each other at different times and locations. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and evaporation in many practical combustion devices. The influence of turbulence in a diffusion flame manifests itself in several forms, ranging from the so-called wrinkled, or stretched, flamelets regime to the distributed combustion regime, depending upon how turbulence interacts with various flame scales. Conventional turbulence models have difficulty treating highly nonlinear reaction rates. A solution procedure based on the composition joint probability density function (PDF) approach holds the promise of modeling various important combustion phenomena relevant to practical combustion devices (such as extinction, blowoff limits, and emissions predictions) because it can account for nonlinear chemical reaction rates without making approximations. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on the PDF method to unstructured grids, parallel computing, and sprays. EUPDF, which was developed by M.S. Raju of Nyma, Inc., was designed to be massively parallel and could easily be coupled with any existing gas-phase and/or spray solvers. EUPDF can use an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements. The application of the PDF method showed favorable results when applied to several supersonic

  6. Eclipse Parallel Tools Platform

    SciTech Connect

    Watson, Gregory; DeBardeleben, Nathan; Rasmussen, Craig

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices, and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures, and basis

  7. A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

    PubMed

    Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

    2014-06-10

    We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion. PMID:26580754

  8. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    NASA Astrophysics Data System (ADS)

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips.

  9. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication.

    PubMed

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given 'Y' shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  10. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    PubMed Central

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  11. Parallel functional activity profiling reveals valvulopathogens are potent 5-hydroxytryptamine(2B) receptor agonists: implications for drug safety assessment.

    PubMed

    Huang, Xi-Ping; Setola, Vincent; Yadav, Prem N; Allen, John A; Rogan, Sarah C; Hanson, Bonnie J; Revankar, Chetana; Robers, Matt; Doucette, Chris; Roth, Bryan L

    2009-10-01

    Drug-induced valvular heart disease (VHD) is a serious side effect of a few medications, including some that are on the market. Pharmacological studies of VHD-associated medications (e.g., fenfluramine, pergolide, methysergide, and cabergoline) have revealed that they and/or their metabolites are potent 5-hydroxytryptamine(2B) (5-HT(2B)) receptor agonists. We have shown that activation of 5-HT(2B) receptors on human heart valve interstitial cells in vitro induces a proliferative response reminiscent of the fibrosis that typifies VHD. To identify current or future drugs that might induce VHD, we screened approximately 2200 U.S. Food and Drug Administration (FDA)-approved or investigational medications to identify 5-HT(2B) receptor agonists, using calcium-based high-throughput screening. Of these 2200 compounds, 27 were 5-HT(2B) receptor agonists (hits); 14 of these had previously been identified as 5-HT(2B) receptor agonists, including seven bona fide valvulopathogens. Six of the hits (guanfacine, quinidine, xylometazoline, oxymetazoline, fenoldopam, and ropinirole) are approved medications. Twenty-three of the hits were then "functionally profiled" (i.e., assayed in parallel for 5-HT(2B) receptor agonism using multiple readouts to test for functional selectivity). In these assays, the known valvulopathogens were efficacious at concentrations as low as 30 nM, whereas the other compounds were less so. Hierarchical clustering analysis of the pEC(50) data revealed that ropinirole (which is not associated with valvulopathy) was clearly segregated from known valvulopathogens. Taken together, our data demonstrate that patterns of 5-HT(2B) receptor functional selectivity might be useful for identifying compounds likely to induce valvular heart disease. PMID:19570945

  12. Functional Traits in Parallel Evolutionary Radiations and Trait-Environment Associations in the Cape Floristic Region of South Africa.

    PubMed

    Mitchell, Nora; Moore, Timothy E; Mollmann, Hayley Kilroy; Carlson, Jane E; Mocko, Kerri; Martinez-Cabrera, Hugo; Adams, Christopher; Silander, John A; Jones, Cynthia S; Schlichting, Carl D; Holsinger, Kent E

    2015-04-01

    Evolutionary radiations with extreme levels of diversity present a unique opportunity to study the role of the environment in plant evolution. If environmental adaptation played an important role in such radiations, we expect to find associations between functional traits and key climatic variables. Similar trait-environment associations across clades may reflect common responses, while contradictory associations may suggest lineage-specific adaptations. Here, we explore trait-environment relationships in two evolutionary radiations in the fynbos biome of the highly biodiverse Cape Floristic Region (CFR) of South Africa. Protea and Pelargonium are morphologically and evolutionarily diverse genera that typify the CFR yet are substantially different in growth form and morphology. Our analytical approach employs a Bayesian multiple-response generalized linear mixed-effects model, taking into account covariation among traits and controlling for phylogenetic relationships. Of the pairwise trait-environment associations tested, 6 out of 24 were in the same direction and 2 out of 24 were in opposite directions, with the latter apparently reflecting alternative life-history strategies. These findings demonstrate that trait diversity within two plant lineages may reflect both parallel and idiosyncratic responses to the environment, rather than all taxa conforming to a global-scale pattern. Such insights are essential for understanding how trait-environment associations arise and how they influence species diversification. PMID:25811086

  13. Parallelization and improvements of the generalized born model with a simple sWitching function for modern graphics processors.

    PubMed

    Arthur, Evan J; Brooks, Charles L

    2016-04-15

    Two fundamental challenges of simulating biologically relevant systems are the rapid calculation of the energy of solvation and the trajectory length of a given simulation. The Generalized Born model with a Simple sWitching function (GBSW) addresses these issues by using an efficient approximation of Poisson-Boltzmann (PB) theory to calculate each solute atom's free energy of solvation, the gradient of this potential, and the subsequent forces of solvation without the need for explicit solvent molecules. This study presents a parallel refactoring of the original GBSW algorithm and its implementation on newly available, low cost graphics chips with thousands of processing cores. Depending on the system size and nonbonded force cutoffs, the new GBSW algorithm offers speed increases of between one and two orders of magnitude over previous implementations while maintaining similar levels of accuracy. We find that much of the algorithm scales linearly with an increase of system size, which makes this water model cost effective for solvating large systems. Additionally, we utilize our GPU-accelerated GBSW model to fold the model system chignolin, and in doing so we demonstrate that these speed enhancements now make accessible folding studies of peptides and potentially small proteins. © 2016 Wiley Periodicals, Inc. PMID:26786647

  14. Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

    NASA Astrophysics Data System (ADS)

    Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

    2010-12-01

    Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer

  15. Kinematic Modeling and Function Generation for Non-linear Curves Using 5R Double Arm Parallel Manipulator

    NASA Astrophysics Data System (ADS)

    Keshavkumar Kamaliya, Parth; Patel, Yashavant Kumar Dashrathlal

    2016-01-01

    Double arm configuration using parallel manipulator mimic the human arm motions either for planar or spatial space. These configurations are currently lucrative for researchers as it also replaces human workers without major redesign of work-place in industries. Humans' joint ranges limitation of arms can be resolved by replacement of either revolute or spherical joints in manipulator. Hence, the scope of maximum workspace utilization is prevailed. Planar configuration with five revolute joints (5R) is considered to imitate human arm motions in a plane using Double Arm Manipulator (DAM). Position analysis for tool that can be held in end links of configuration is carried out using Pro/mechanism in Creo® as well as SimMechanics. D-H parameters are formulated and its results derived using developed MATLAB programs are compared with mechanism simulation as well as SimMechanics results. Inverse kinematics model is developed for trajectory planning in order to trace tool trajectory in a continuous and smooth sequence. Polynomial functions are derived for position, velocity and acceleration for linear and non-linear curves in joint space. Analytical results obtained for trajectory planning are validated with simulation results of Creo®.

  16. A preliminary numerical evaluation of a parallel algorithm for approximating the values and subgradients of the recourse function in a stochastic program with complete recourse

    SciTech Connect

    Lessor, K.S.

    1988-08-26

    The parallel algorithm of Ariyawansa, Sorensen, and Wets for approximating the values and subgradients of the recourse function in a stochastic program with complete recourse is implemented and timing results are reported for limited experimental trials. 14 refs., 6 figs., 8 tabs.

  17. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  18. Eclipse Parallel Tools Platform

    Energy Science and Technology Software Center (ESTSC)

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures

  19. Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

    PubMed

    Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

    2016-08-01

    The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. PMID:27317328

  20. Dynamic multi-swarm particle swarm optimizer using parallel PC cluster systems for global optimization of large-scale multimodal functions

    NASA Astrophysics Data System (ADS)

    Fan, Shu-Kai S.; Chang, Ju-Ming

    2010-05-01

    This article presents a novel parallel multi-swarm optimization (PMSO) algorithm with the aim of enhancing the search ability of standard single-swarm PSOs for global optimization of very large-scale multimodal functions. Different from the existing multi-swarm structures, the multiple swarms work in parallel, and the search space is partitioned evenly and dynamically assigned in a weighted manner via the roulette wheel selection (RWS) mechanism. This parallel, distributed framework of the PMSO algorithm is developed based on a master-slave paradigm, which is implemented on a cluster of PCs using message passing interface (MPI) for information interchange among swarms. The PMSO algorithm handles multiple swarms simultaneously and each swarm performs PSO operations of its own independently. In particular, one swarm is designated for global search and the others are for local search. The first part of the experimental comparison is made among the PMSO, standard PSO, and two state-of-the-art algorithms (CTSS and CLPSO) in terms of various un-rotated and rotated benchmark functions taken from the literature. In the second part, the proposed multi-swarm algorithm is tested on large-scale multimodal benchmark functions up to 300 dimensions. The results of the PMSO algorithm show great promise in solving high-dimensional problems.

  1. Massively parallel visualization: Parallel rendering

    SciTech Connect

    Hansen, C.D.; Krogh, M.; White, W.

    1995-12-01

    This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.

  2. Random-iteration algorithm-based optical parallel architecture for fractal-image decoding by use of iterated-function system codes.

    PubMed

    Chang, H T; Kuo, C J

    1998-03-10

    An optical parallel architecture for the random-iteration algorithm to decode a fractal image by use of iterated-function system (IFS) codes is proposed. The code value is first converted into transmittance in film or a spatial light modulator in the optical part of the system. With an optical-to-electrical converter, electrical-to-optical converter, and some electronic circuits for addition and delay, we can perform the contractive affine transformation (CAT) denoted in IFS codes. In the proposed decoding architecture all CAT's generate points (image pixels) in parallel, and these points then are joined for display purposes. Therefore the decoding speed is improved greatly compared with existing serial-decoding architectures. In addition, an error and stability analysis that considers nonperfect elements is presented for the proposed optical system. Finally, simulation results are given to validate the proposed architecture. PMID:18268718

  3. Parallel machines: Parallel machine languages

    SciTech Connect

    Iannucci, R.A. )

    1990-01-01

    This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.

  4. Parallel pipelining

    SciTech Connect

    Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.

    1995-09-01

    In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.

  5. rasterEngine: an easy-to-use R function for applying complex geostatistical models to raster datasets in a parallel computing environment

    NASA Astrophysics Data System (ADS)

    Greenberg, J. A.

    2013-12-01

    As geospatial analyses progress in tandem with increasing availability of large complex geographic data sets and high performance computing (HPC), there is an increasing gap in the ability of end-user tools to take advantage of these advances. Specifically, the practical implementation of complex statistical models on large gridded geographic datasets (e.g. remote sensing analysis, species distribution mapping, topographic transformations, and local neighborhood analyses) currently requires a significant knowledge base. A user must be proficient in the chosen model as well as the nuances of scientific programming, raster data models, memory management, parallel computing, and system design. This is further complicated by the fact that many of the cutting-edge analytical tools were developed for non-geospatial datasets and are not part of standard GIS packages, but are available in scientific computing languages such as R and MATLAB. We present a computing function 'rasterEngine' written in the R scientific computing language and part of the CRAN package 'spatial.tools' with these challenges in mind. The goal of rasterEngine is to allow a user to quickly develop and apply analytical models within the R computing environment to arbitrarily large gridded datasets, taking advantage of available parallel computing resources, and without requiring a deep understanding of HPC and raster data models. We provide several examples of rasterEngine being used to solve common grid based analyses, including remote sensing image analyses, topographic transformations, and species distribution modeling. With each example, the parallel processing performance results are presented.

  6. Accelerating the performance of a novel meshless method based on collocation with radial basis functions by employing a graphical processing unit as a parallel coprocessor

    NASA Astrophysics Data System (ADS)

    Owusu-Banson, Derek

    In recent times, a variety of industries, applications and numerical methods including the meshless method have enjoyed a great deal of success by utilizing the graphical processing unit (GPU) as a parallel coprocessor. These benefits often include performance improvement over the previous implementations. Furthermore, applications running on graphics processors enjoy superior performance per dollar and performance per watt than implementations built exclusively on traditional central processing technologies. The GPU was originally designed for graphics acceleration but the modern GPU, known as the General Purpose Graphical Processing Unit (GPGPU) can be used for scientific and engineering calculations. The GPGPU consists of massively parallel array of integer and floating point processors. There are typically hundreds of processors per graphics card with dedicated high-speed memory. This work describes an application written by the author, titled GaussianRBF to show the implementation and results of a novel meshless method that in-cooperates the collocation of the Gaussian radial basis function by utilizing the GPU as a parallel co-processor. Key phases of the proposed meshless method have been executed on the GPU using the NVIDIA CUDA software development kit. Especially, the matrix fill and solution phases have been carried out on the GPU, along with some post processing. This approach resulted in a decreased processing time compared to similar algorithm implemented on the CPU while maintaining the same accuracy.

  7. Data parallelism

    SciTech Connect

    Gorda, B.C.

    1992-09-01

    Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer's task easier.

  8. Data parallelism

    SciTech Connect

    Gorda, B.C.

    1992-09-01

    Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer`s task easier.

  9. Parallel Total Energy

    Energy Science and Technology Software Center (ESTSC)

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  10. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells

    PubMed Central

    Liu, Xiao-Ping; Koehler, Karl R.; Mikosz, Andrew M.; Hashino, Eri; Holt, Jeffrey R.

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair. PMID:27215798

  11. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells.

    PubMed

    Liu, Xiao-Ping; Koehler, Karl R; Mikosz, Andrew M; Hashino, Eri; Holt, Jeffrey R

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair. PMID:27215798

  12. A study of parallelizing O(N) Green-function-based Monte Carlo method for many fermions coupled with classical degrees of freedom

    NASA Astrophysics Data System (ADS)

    Zhang, Shixun; Yamagia, Shinichi; Yunoki, Seiji

    2013-08-01

    Models of fermions interacting with classical degrees of freedom are applied to a large variety of systems in condensed matter physics. For this class of models, Weiße [Phys. Rev. Lett. 102, 150604 (2009)] has recently proposed a very efficient numerical method, called O(N) Green-Function-Based Monte Carlo (GFMC) method, where a kernel polynomial expansion technique is used to avoid the full numerical diagonalization of the fermion Hamiltonian matrix of size N, which usually costs O(N3) computational complexity. Motivated by this background, in this paper we apply the GFMC method to the double exchange model in three spatial dimensions. We mainly focus on the implementation of GFMC method using both MPI on a CPU-based cluster and Nvidia's Compute Unified Device Architecture (CUDA) programming techniques on a GPU-based (Graphics Processing Unit based) cluster. The time complexity of the algorithm and the parallel implementation details on the clusters are discussed. We also show the performance scaling for increasing Hamiltonian matrix size and increasing number of nodes, respectively. The performance evaluation indicates that for a 323 Hamiltonian a single GPU shows higher performance equivalent to more than 30 CPU cores parallelized using MPI.

  13. A parallel implementation of the analytic nuclear gradient for time-dependent density functional theory within the Tamm-Dancoff approximation

    NASA Astrophysics Data System (ADS)

    Liu, Fenglai; Gan, Zhengting; Shao, Yihan; Hsu, Chao-Ping; Dreuw, Andreas; Head-Gordon, Martin; Miller, Benjamin T.; Brooks, Bernard R.; Yu, Jian-Guo; Furlani, Thomas R.; Kong, Jing

    2010-10-01

    We derived the analytic gradient for the excitation energies from a time-dependent density functional theory calculation within the Tamm-Dancoff approximation (TDDFT/TDA) using Gaussian atomic orbital basis sets, and introduced an efficient serial and parallel implementation. Some timing results are shown from a B3LYP/6-31G**/SG-1-grid calculation on zincporphyrin. We also performed TDDFT/TDA geometry optimizations for low-lying excited states of 20 small molecules, and compared adiabatic excitation energies and optimized geometry parameters to experimental values using the B3LYP and ωB97 functionals. There are only minor differences between TDDFT and TDA optimized excited state geometries and adiabatic excitation energies. Optimized bond lengths are in better agreement with experiment for both functionals than either CC2 or SOS-CIS(D0), while adiabatic excitation energies are in similar or slightly poorer agreement. Optimized bond angles with both functionals are more accurate than CIS values, but less accurate than either CC2 or SOS-CIS(D0) ones.

  14. Drawing a high-resolution functional map of adeno-associated virus capsid by massively parallel sequencing

    PubMed Central

    Adachi, Kei; Enoki, Tatsuji; Kawano, Yasuhiro; Veraz, Michael; Nakai, Hiroyuki

    2014-01-01

    Adeno-associated virus (AAV) capsid engineering is an emerging approach to advance gene therapy. However, a systematic analysis on how each capsid amino acid contributes to multiple functions remains challenging. Here we show proof-of-principle and successful application of a novel approach, termed AAV Barcode-Seq, that allows us to characterize phenotypes of hundreds of different AAV strains in a high-throughput manner and therefore overcomes technical difficulties in the systematic analysis. In this approach, we generate DNA barcode-tagged AAV libraries and determine a spectrum of phenotypes of each AAV strain by Illumina barcode sequencing. By applying this method to AAV capsid mutant libraries tagged with DNA barcodes, we can draw a high-resolution map of AAV capsid amino acids important for the structural integrity and functions including receptor binding, tropism, neutralization and blood clearance. Thus, Barcode-Seq provides a new tool to generate a valuable resource for virus and gene therapy research. PMID:24435020

  15. Resonance line transfer calculations by doubling thin layers. I - Comparison with other techniques. II - The use of the R-parallel redistribution function. [planetary atmospheres

    NASA Technical Reports Server (NTRS)

    Yelle, Roger V.; Wallace, Lloyd

    1989-01-01

    A versatile and efficient technique for the solution of the resonance line scattering problem with frequency redistribution in planetary atmospheres is introduced. Similar to the doubling approach commonly used in monochromatic scattering problems, the technique has been extended to include the frequency dependence of the radiation field. Methods for solving problems with external or internal sources and coupled spectral lines are presented, along with comparison of some sample calculations with results from Monte Carlo and Feautrier techniques. The doubling technique has also been applied to the solution of resonance line scattering problems where the R-parallel redistribution function is appropriate, both neglecting and including polarization as developed by Yelle and Wallace (1989). With the constraint that the atmosphere is illuminated from the zenith, the only difficulty of consequence is that of performing precise frequency integrations over the line profiles. With that problem solved, it is no longer necessary to use the Monte Carlo method to solve this class of problem.

  16. Parallel Information Processing.

    ERIC Educational Resources Information Center

    Rasmussen, Edie M.

    1992-01-01

    Examines parallel computer architecture and the use of parallel processors for text. Topics discussed include parallel algorithms; performance evaluation; parallel information processing; parallel access methods for text; parallel and distributed information retrieval systems; parallel hardware for text; and network models for information…

  17. NOCA-1 functions with γ-tubulin and in parallel to Patronin to assemble non-centrosomal microtubule arrays in C. elegans

    PubMed Central

    Wang, Shaohe; Wu, Di; Quintin, Sophie; Green, Rebecca A; Cheerambathur, Dhanya K; Ochoa, Stacy D; Desai, Arshad; Oegema, Karen

    2015-01-01

    Non-centrosomal microtubule arrays assemble in differentiated tissues to perform mechanical and transport-based functions. In this study, we identify Caenorhabditis elegans NOCA-1 as a protein with homology to vertebrate ninein. NOCA-1 contributes to the assembly of non-centrosomal microtubule arrays in multiple tissues. In the larval epidermis, NOCA-1 functions redundantly with the minus end protection factor Patronin/PTRN-1 to assemble a circumferential microtubule array essential for worm growth and morphogenesis. Controlled degradation of a γ-tubulin complex subunit in this tissue revealed that γ-tubulin acts with NOCA-1 in parallel to Patronin/PTRN-1. In the germline, NOCA-1 and γ-tubulin co-localize at the cell surface, and inhibiting either leads to a microtubule assembly defect. γ-tubulin targets independently of NOCA-1, but NOCA-1 targeting requires γ-tubulin when a non-essential putatively palmitoylated cysteine is mutated. These results show that NOCA-1 acts with γ-tubulin to assemble non-centrosomal arrays in multiple tissues and highlight functional overlap between the ninein and Patronin protein families. DOI: http://dx.doi.org/10.7554/eLife.08649.001 PMID:26371552

  18. Fracture problem for an external circumferential crack in a functionally graded superconducting cylinder subjected to a parallel magnetic field

    NASA Astrophysics Data System (ADS)

    Yan, Z.; Gao, S. W.; Feng, W. J.

    2016-02-01

    In this study, the multiple isoparametric finite element method (MIFEM) is used to investigate external circumferential crack problem of a functionally graded superconducting cylinder subjected to electromagnetic forces. The superconducting cylinder is composed by Bi2223/Ag composite with material parameters varying. A crack reference region is defined to reflect the effects of crack on flux and current densities, and the magnetically impermeable crack surface condition and the generalized Irie-Yamafuji critical state model outside the crack region are adopted. The distributions of magnetic flux density in the superconducting cylinder are obtained analytically for both the zero-field cooling (ZFC) and the field cooling (FC) activation processes. Based on the MIFEM, the stress intensity factors (SIFs) at crack fronts in the process of field ascent and/or descent are then numerically calculated. It is interesting to note from numerical results that for the present crack model in the ZFC activation process, the crack is easily propagate and grow with the applied field increases, and that in the field descent process of either the ZFC case or FC case, the crack generally does not propagate. In addition, in the field ascent process of the ZFC case, the SIFs depend on not only the crack depths and model parameters but also the applied field. The present study should be helpful to the design and application of high-temperature superconductors with external edge cracks.

  19. Parallel processor engine model program

    NASA Technical Reports Server (NTRS)

    Mclaughlin, P.

    1984-01-01

    The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

  20. Parallel assessment of male reproductive function in workers and wild rats exposed to pesticides in banana plantations in Guadeloupe

    PubMed Central

    Multigner, Luc; Kadhel, Philippe; Pascal, Michel; Huc-Terki, Farida; Kercret, Henri; Massart, Catherine; Janky, Eustase; Auger, Jacques; Jégou, Bernard

    2008-01-01

    Background There is increasing evidence that reproductive abnormalities are increasing in frequency in both human population and among wild fauna. This increase is probably related to exposure to toxic contaminants in the environment. The use of sentinel species to raise alarms relating to human reproductive health has been strongly recommended. However, no simultaneous studies at the same site have been carried out in recent decades to evaluate the utility of wild animals for monitoring human reproductive disorders. We carried out a joint study in Guadeloupe assessing the reproductive function of workers exposed to pesticides in banana plantations and of male wild rats living in these plantations. Methods A cross-sectional study was performed to assess semen quality and reproductive hormones in banana workers and in men working in non-agricultural sectors. These reproductive parameters were also assessed in wild rats captured in the plantations and were compared with those in rats from areas not directly polluted by humans. Results No significant difference in sperm characteristics and/or hormones was found between workers exposed and not exposed to pesticide. By contrast, rats captured in the banana plantations had lower testosterone levels and gonadosomatic indices than control rats. Conclusion Wild rats seem to be more sensitive than humans to the effects of pesticide exposure on reproductive health. We conclude that the concept of sentinel species must be carefully validated as the actual nature of exposure may varies between human and wild species as well as the vulnerable time period of exposure and various ecological factors. PMID:18667078

  1. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    NASA Astrophysics Data System (ADS)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering

  2. Executive functioning as a mediator of conduct problems prevention in children of homeless families residing in temporary supportive housing: a parallel process latent growth modeling approach.

    PubMed

    Piehler, Timothy F; Bloomquist, Michael L; August, Gerald J; Gewirtz, Abigail H; Lee, Susanne S; Lee, Wendy S C

    2014-01-01

    A culturally diverse sample of formerly homeless youth (ages 6-12) and their families (n = 223) participated in a cluster randomized controlled trial of the Early Risers conduct problems prevention program in a supportive housing setting. Parents provided 4 annual behaviorally-based ratings of executive functioning (EF) and conduct problems, including at baseline, over 2 years of intervention programming, and at a 1-year follow-up assessment. Using intent-to-treat analyses, a multilevel latent growth model revealed that the intervention group demonstrated reduced growth in conduct problems over the 4 assessment points. In order to examine mediation, a multilevel parallel process latent growth model was used to simultaneously model growth in EF and growth in conduct problems along with intervention status as a covariate. A significant mediational process emerged, with participation in the intervention promoting growth in EF, which predicted negative growth in conduct problems. The model was consistent with changes in EF fully mediating intervention-related changes in youth conduct problems over the course of the study. These findings highlight the critical role that EF plays in behavioral change and lends further support to its importance as a target in preventive interventions with populations at risk for conduct problems. PMID:24141709

  3. Executive Functioning as a Mediator of Conduct Problems Prevention in Children of Homeless Families Residing in Temporary Supportive Housing: A Parallel Process Latent Growth Modeling Approach

    PubMed Central

    Piehler, Timothy F.; Bloomquist, Michael L.; August, Gerald J.; Gewirtz, Abigail H.; Lee, Susanne S.; Lee, Wendy S. C.

    2013-01-01

    A culturally diverse sample of formerly homeless youth (ages 6 – 12) and their families (n=223) participated in a cluster randomized controlled trial of the Early Risers conduct problems prevention program in a supportive housing setting. Parents provided 4 annual behaviorally-based ratings of executive functioning (EF) and conduct problems, including at baseline, over 2 years of intervention programming, and at a 1-year follow-up assessment. Using intent-to-treat analyses, a multilevel latent growth model revealed that the intervention group demonstrated reduced growth in conduct problems over the 4 assessment points. In order to examine mediation, a multilevel parallel process latent growth model was used to simultaneously model growth in EF and growth in conduct problems along with intervention status as a covariate. A significant mediational process emerged, with participation in the intervention promoting growth in EF, which predicted negative growth in conduct problems. The model was consistent with changes in EF fully mediating intervention-related changes in youth conduct problems over the course of the study. These findings highlight the critical role that EF plays in behavioral change and lends further support to its importance as a target in preventive interventions with populations at risk for conduct problems. PMID:24141709

  4. Parallel Programming in the Age of Ubiquitous Parallelism

    NASA Astrophysics Data System (ADS)

    Pingali, Keshav

    2014-04-01

    Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs

  5. Ultrascalable petaflop parallel supercomputer

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  6. Special parallel processing workshop

    SciTech Connect

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  7. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  8. Matpar: Parallel Extensions for MATLAB

    NASA Technical Reports Server (NTRS)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  9. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  10. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  11. Parallel Analysis of mRNA and microRNA Microarray Profiles to Explore Functional Regulatory Patterns in Polycystic Kidney Disease: Using PKD/Mhm Rat Model

    PubMed Central

    Dweep, Harsh; Sticht, Carsten; Kharkar, Asawari; Pandey, Priyanka; Gretz, Norbert

    2013-01-01

    Autosomal polycystic kidney disease (ADPKD) is a frequent monogenic renal disease, characterised by fluid-filled cysts that are thought to result from multiple deregulated pathways such as cell proliferation and apoptosis. MicroRNAs (miRNAs) are small non-coding RNAs that regulate the expression of many genes associated with such biological processes and human pathologies. To explore the possible regulatory role of miRNAs in PKD, the PKD/Mhm (cy/+) rat, served as a model to study human ADPKD. A parallel microarray-based approach was conducted to profile the expression changes of mRNAs and miRNAs in PKD/Mhm rats. 1,573 up- and 1,760 down-regulated genes were differentially expressed in PKD/Mhm. These genes are associated with 17 pathways (such as focal adhesion, cell cycle, ECM-receptor interaction, DNA replication and metabolic pathways) and 47 (e.g., cell proliferation, Wnt and Tgfβ signaling) Gene Ontologies. Furthermore, we found the similar expression patterns of deregulated genes between PKD/Mhm (cy/+) rat and human ADPKD, PKD1L3/L3, PKD1−/−, Hnf1α-deficient, and Glis2lacZ/lacZ models. Additionally, several differentially regulated genes were noted to be target hubs for miRNAs. We also obtained 8 significantly up-regulated miRNAs (rno-miR-199a-5p, −214, −146b, −21, −34a, −132, −31 and −503) in diseased kidneys of PKD/Mhm rats. Additionally, the binding site overrepresentation and pathway enrichment analyses were accomplished on the putative targets of these 8 miRNAs. 7 out of these 8 miRNAs and their possible interactions have not been previously described in ADPKD. We have shown a strong overlap of functional patterns (pathways) between deregulated miRNAs and mRNAs in the PKD/Mhm (cy/+) rat model. Our findings suggest that several miRNAs may be associated in regulating pathways in ADPKD. We further describe novel miRNAs and their possible targets in ADPKD, which will open new avenues to understand the pathogenesis of human ADPKD

  12. MPP parallel forth

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1987-01-01

    Massively Parallel Processor (MPP) Parallel FORTH is a derivative of FORTH-83 and Unified Software Systems' Uni-FORTH. The extension of FORTH into the realm of parallel processing on the MPP is described. With few exceptions, Parallel FORTH was made to follow the description of Uni-FORTH as closely as possible. Likewise, the parallel FORTH extensions were designed as philosophically similar to serial FORTH as possible. The MPP hardware characteristics, as viewed by the FORTH programmer, is discussed. Then a description is presented of how parallel FORTH is implemented on the MPP.

  13. Bounded Parallel-Batch Scheduling on Unrelated Parallel Machines

    NASA Astrophysics Data System (ADS)

    Miao, Cuixia; Zhang, Yuzhong; Wang, Chengfei

    In this paper, we consider the bounded parallel-batch scheduling problem on unrelated parallel machines. Problems R m |B|F are NP-hard for any objective function F. For this reason, we discuss the special case with p ij = p i for i = 1, 2, ⋯ , m , j = 1, 2, ⋯ , n. We give optimal algorithms for the general scheduling to minimize total weighted completion time, makespan and the number of tardy jobs. And we design pseudo-polynomial time algorithms for the case with rejection penalty to minimize the makespan and the total weighted completion time plus the total penalty of the rejected jobs, respectively.

  14. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  15. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  16. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  17. Parallel Atomistic Simulations

    SciTech Connect

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  18. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    SciTech Connect

    Azmy, Y.Y.; Barnett, D.A.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  19. Multitasking TORT under UNICOS: Parallel performance models and measurements

    SciTech Connect

    Barnett, A.; Azmy, Y.Y.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  20. Linearly exact parallel closures for slab geometry

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-01

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).

  1. Linearly exact parallel closures for slab geometry

    SciTech Connect

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-15

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients)

  2. Improving the spatial accuracy in functional magnetic resonance imaging (fMRI) based on the blood oxygenation level dependent (BOLD) effect: benefits from parallel imaging and a 32-channel head array coil at 1.5 Tesla.

    PubMed

    Fellner, C; Doenitz, C; Finkenzeller, T; Jung, E M; Rennert, J; Schlaier, J

    2009-01-01

    Geometric distortions and low spatial resolution are current limitations in functional magnetic resonance imaging (fMRI). The aim of this study was to evaluate if application of parallel imaging or significant reduction of voxel size in combination with a new 32-channel head array coil can reduce those drawbacks at 1.5 T for a simple hand motor task. Therefore, maximum t-values (tmax) in different regions of activation, time-dependent signal-to-noise ratios (SNR(t)) as well as distortions within the precentral gyrus were evaluated. Comparing fMRI with and without parallel imaging in 17 healthy subjects revealed significantly reduced geometric distortions in anterior-posterior direction. Using parallel imaging, tmax only showed a mild reduction (7-11%) although SNR(t) was significantly diminished (25%). In 7 healthy subjects high-resolution (2 x 2 x 2 mm3) fMRI was compared with standard fMRI (3 x 3 x 3 mm3) in a 32-channel coil and with high-resolution fMRI in a 12-channel coil. The new coil yielded a clear improvement for tmax (21-32%) and SNR(t) (51%) in comparison with the 12-channel coil. Geometric distortions were smaller due to the smaller voxel size. Therefore, the reduction in tmax (8-16%) and SNR(t) (52%) in the high-resolution experiment seems to be tolerable with this coil. In conclusion, parallel imaging is an alternative to reduce geometric distortions in fMRI at 1.5 T. Using a 32-channel coil, reduction of the voxel size might be the preferable way to improve spatial accuracy. PMID:19713602

  3. Parallel digital forensics infrastructure.

    SciTech Connect

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  4. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  5. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  6. Parallel MR Imaging

    PubMed Central

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A.; Seiberlich, Nicole

    2015-01-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the under-sampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. PMID:22696125

  7. Social Problems and Deviance: Some Parallel Issues

    ERIC Educational Resources Information Center

    Kitsuse, John I.; Spector, Malcolm

    1975-01-01

    Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)

  8. Parallel in vivo and in vitro detection of functional somatostatin receptors in human endocrine pancreatic tumors: Consequences with regard to diagnosis, localization, and therapy

    SciTech Connect

    Lamberts, S.W.; Hofland, L.J.; van Koetsveld, P.M.; Reubi, J.C.; Bruining, H.A.; Bakker, W.H.; Krenning, E.P. )

    1990-09-01

    The effects of octreotide in vivo and in vitro on hormone release, in vivo ({sup 123}I)Tyr3-octreotide scanning, and in vitro ({sup 125}I)Tyr3-octreotide autoradiography were compared in five patients with endocrine pancreatic tumors. ({sup 123}I)Tyr3-octreotide scanning localized the primary tumor and/or previously unknown metastases in four of the five patients. The patient with a negative scan had an insulinoma that did not respond to octreotide in vivo. No Tyr3-octreotide-binding sites were subsequently found at autoradiography of the tumor, whereas somatostatin-14 receptors were present at a high density. In parallel, culture studies with the cells prepared from this adenoma showed that insulin release was not affected by octreotide, while both somatostatin-14 and -28 significantly suppressed hormone release. Culture studies of the tumor cells from two gastrinomas showed a dose-dependent inhibition of gastrin release by octreotide. Octreotide exerted direct antiproliferative effects in one of these gastrinomas, which had been shown to be rapidly growing in vivo. Both gastrinomas had specific somatostatin receptors, as measured by in vitro receptor autoradiography. Somatostatin release by the cultured somatostatinoma cells from one of these patients was suppressed by octreotide.

  9. Parallel scheduling algorithms

    SciTech Connect

    Dekel, E.; Sahni, S.

    1983-01-01

    Parallel algorithms are given for scheduling problems such as scheduling to minimize the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish time. The shared memory model of parallel computers is used to obtain fast algorithms. 26 references.

  10. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  11. Parallel computing works

    SciTech Connect

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  12. Parallel nearest neighbor calculations

    NASA Astrophysics Data System (ADS)

    Trease, Harold

    We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.

  13. Bilingual parallel programming

    SciTech Connect

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.

  14. The function of the glutamate–nitric oxide–cGMP pathway in brain in vivo and learning ability decrease in parallel in mature compared with young rats

    PubMed Central

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-01-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate–nitric oxide–cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to learn this task decreases with age and whether this reduction is associated with a decreased function of the glutamate–nitric oxide–cGMP pathway in brain in vivo, as analyzed by microdialysis in freely moving rats. We show that 7-mo-old rats need significantly more (192 ± 64%) trials than do 3-mo-old rats to learn the Y-maze task. Moreover, the function of the glutamate–nitric oxide–cGMP pathway is reduced by 60 ± 23% in 7-mo-old rats compared with 3-mo-old rats. The results reported support the idea that the reduction in the ability to learn the Y-maze task (and likely other types of learning) of mature compared with young rats would be a consequence of reduced function of the glutamate–nitric oxide–cGMP pathway. PMID:17412964

  15. The function of the glutamate-nitric oxide-cGMP pathway in brain in vivo and learning ability decrease in parallel in mature compared with young rats.

    PubMed

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-04-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate-nitric oxide-cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to learn this task decreases with age and whether this reduction is associated with a decreased function of the glutamate-nitric oxide-cGMP pathway in brain in vivo, as analyzed by microdialysis in freely moving rats. We show that 7-mo-old rats need significantly more (192 +/- 64%) trials than do 3-mo-old rats to learn the Y-maze task. Moreover, the function of the glutamate-nitric oxide-cGMP pathway is reduced by 60 +/- 23% in 7-mo-old rats compared with 3-mo-old rats. The results reported support the idea that the reduction in the ability to learn the Y-maze task (and likely other types of learning) of mature compared with young rats would be a consequence of reduced function of the glutamate-nitric oxide-cGMP pathway. PMID:17412964

  16. The Function of the Glutamate-Nitric Oxide-cGMP Pathway in Brain in Vivo and Learning Ability Decrease in Parallel in Mature Compared with Young Rats

    ERIC Educational Resources Information Center

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-01-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate-nitric oxide-cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to…

  17. Expression of mitochondrial regulatory genes parallels respiratory capacity and contractile function in a rat model of hypoxia-induced right ventricular hypertrophy

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chronic hypobaric hypoxia (CHH) increases load on the right ventricle (RV) resulting in RV hypertrophy. We hypothesized that CHH elicits distinct responses, i.e., the hypertrophied RV, unlike the left ventricle (LV), displaying enhanced mitochondrial respiratory and contractile function. Wistar rats...

  18. Processing Semblances Induced through Inter-Postsynaptic Functional LINKs, Presumed Biological Parallels of K-Lines Proposed for Building Artificial Intelligence

    PubMed Central

    Vadakkan, Kunjumon I.

    2011-01-01

    The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation – namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky’s K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system. PMID:21845180

  19. Mirror versus parallel bimanual reaching

    PubMed Central

    2013-01-01

    Background In spite of their importance to everyday function, tasks that require both hands to work together such as lifting and carrying large objects have not been well studied and the full potential of how new technology might facilitate recovery remains unknown. Methods To help identify the best modes for self-teleoperated bimanual training, we used an advanced haptic/graphic environment to compare several modes of practice. In a 2-by-2 study, we compared mirror vs. parallel reaching movements, and also compared veridical display to one that transforms the right hand’s cursor to the opposite side, reducing the area that the visual system has to monitor. Twenty healthy, right-handed subjects (5 in each group) practiced 200 movements. We hypothesized that parallel reaching movements would be the best performing, and attending to one visual area would reduce the task difficulty. Results The two-way comparison revealed that mirror movement times took an average 1.24 s longer to complete than parallel. Surprisingly, subjects’ movement times moving to one target (attending to one visual area) also took an average of 1.66 s longer than subjects moving to two targets. For both hands, there was also a significant interaction effect, revealing the lowest errors for parallel movements moving to two targets (p < 0.001). This was the only group that began and maintained low errors throughout training. Conclusion Combined with other evidence, these results suggest that the most intuitive reaching performance can be observed with parallel movements with a veridical display (moving to two separate targets). These results point to the expected levels of challenge for these bimanual training modes, which could be used to advise therapy choices in self-neurorehabilitation. PMID:23837908

  20. Parallel system simulation

    SciTech Connect

    Tai, H.M.; Saeks, R.

    1984-03-01

    A relaxation algorithm for solving large-scale system simulation problems in parallel is proposed. The algorithm, which is composed of both a time-step parallel algorithm and a component-wise parallel algorithm, is described. The interconnected nature of the system, which is characterized by the component connection model, is fully exploited by this approach. A technique for finding an optimal number of the time steps is also described. Finally, this algorithm is illustrated via several examples in which the possible trade-offs between the speed-up ratio, efficiency, and waiting time are analyzed.

  1. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  2. NWChem: scalable parallel computational chemistry

    SciTech Connect

    van Dam, Hubertus JJ; De Jong, Wibe A.; Bylaska, Eric J.; Govind, Niranjan; Kowalski, Karol; Straatsma, TP; Valiev, Marat

    2011-11-01

    NWChem is a general purpose computational chemistry code specifically designed to run on distributed memory parallel computers. The core functionality of the code focuses on molecular dynamics, Hartree-Fock and density functional theory methods for both plane-wave basis sets as well as Gaussian basis sets, tensor contraction engine based coupled cluster capabilities and combined quantum mechanics/molecular mechanics descriptions. It was realized from the beginning that scalable implementations of these methods required a programming paradigm inherently different from what message passing approaches could offer. In response a global address space library, the Global Array Toolkit, was developed. The programming model it offers is based on using predominantly one-sided communication. This model underpins most of the functionality in NWChem and the power of it is exemplified by the fact that the code scales to tens of thousands of processors. In this paper the core capabilities of NWChem are described as well as their implementation to achieve an efficient computational chemistry code with high parallel scalability. NWChem is a modern, open source, computational chemistry code1 specifically designed for large scale parallel applications2. To meet the challenges of developing efficient, scalable and portable programs of this nature a particular code design was adopted. This code design involved two main features. First of all, the code is build up in a modular fashion so that a large variety of functionality can be integrated easily. Secondly, to facilitate writing complex parallel algorithms the Global Array toolkit was developed. This toolkit allows one to write parallel applications in a shared memory like approach, but offers additional mechanisms to exploit data locality to lower communication overheads. This framework has proven to be very successful in computational chemistry but is applicable to any engineering domain. Within the context created by the features

  3. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  4. Parallels with nature

    NASA Astrophysics Data System (ADS)

    2014-10-01

    Adam Nelson and Stuart Warriner, from the University of Leeds, talk with Nature Chemistry about their work to develop viable synthetic strategies for preparing new chemical structures in parallel with the identification of desirable biological activity.

  5. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  6. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  7. Partitioning and parallel radiosity

    NASA Astrophysics Data System (ADS)

    Merzouk, S.; Winkler, C.; Paul, J. C.

    1996-03-01

    This paper proposes a theoretical framework, based on domain subdivision for parallel radiosity. Moreover, three various implementation approaches, taking advantage of partitioning algorithms and global shared memory architecture, are presented.

  8. Simplified Parallel Domain Traversal

    SciTech Connect

    Erickson III, David J

    2011-01-01

    Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep by performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.

  9. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  10. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  11. Continuous parallel coordinates.

    PubMed

    Heinrich, Julian; Weiskopf, Daniel

    2009-01-01

    Typical scientific data is represented on a grid with appropriate interpolation or approximation schemes,defined on a continuous domain. The visualization of such data in parallel coordinates may reveal patterns latently contained in the data and thus can improve the understanding of multidimensional relations. In this paper, we adopt the concept of continuous scatterplots for the visualization of spatially continuous input data to derive a density model for parallel coordinates. Based on the point-line duality between scatterplots and parallel coordinates, we propose a mathematical model that maps density from a continuous scatterplot to parallel coordinates and present different algorithms for both numerical and analytical computation of the resulting density field. In addition, we show how the 2-D model can be used to successively construct continuous parallel coordinates with an arbitrary number of dimensions. Since continuous parallel coordinates interpolate data values within grid cells, a scalable and dense visualization is achieved, which will be demonstrated for typical multi-variate scientific data. PMID:19834230

  12. Parallel re-modeling of EF-1α function: divergent EF-1α genes co-occur with EFL genes in diverse distantly related eukaryotes

    PubMed Central

    2013-01-01

    Background Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a ‘differential loss’ hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species). Results In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes. Conclusions According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes. PMID:23800323

  13. Finite element computation with parallel VLSI

    NASA Technical Reports Server (NTRS)

    Mcgregor, J.; Salama, M.

    1983-01-01

    This paper describes a parallel processing computer consisting of a 16-bit microcomputer as a master processor which controls and coordinates the activities of 8086/8087 VLSI chip set slave processors working in parallel. The hardware is inexpensive and can be flexibly configured and programmed to perform various functions. This makes it a useful research tool for the development of, and experimentation with parallel mathematical algorithms. Application of the hardware to computational tasks involved in the finite element analysis method is demonstrated by the generation and assembly of beam finite element stiffness matrices. A number of possible schemes for the implementation of N-elements on N- or n-processors (N is greater than n) are described, and the speedup factors of their time consumption are determined as a function of the number of available parallel processors.

  14. Two-axis acceleration of functional connectivity magnetic resonance imaging by parallel excitation of phase-tagged slices and half k-space acceleration.

    PubMed

    Jesmanowicz, Andrzej; Nencka, Andrew S; Li, Shi-Jiang; Hyde, James S

    2011-01-01

    Whole brain functional connectivity magnetic resonance imaging requires acquisition of a time course of gradient-recalled (GR) volumetric images. A method is developed to accelerate this acquisition using GR echo-planar imaging and radio frequency (RF) slice phase tagging. For N-fold acceleration, a tailored RF pulse excites N slices using a uniform-field transmit coil. This pulse is the Fourier transform of the profile for the N slices with a predetermined RF phase tag on each slice. A multichannel RF receive coil is used for detection. For n slices, there are n/N groups of slices. Signal-averaged reference images are created for each slice within each slice group for each member of the coil array and used to separate overlapping images that are simultaneously received. The time-overhead for collection of reference images is small relative to the acquisition time of a complete volumetric time course. A least-squares singular value decomposition method allows image separation on a pixel-by-pixel basis. Twofold slice acceleration is demonstrated using an eight-channel RF receive coil, with application to resting-state functional magnetic resonance imaging in the human brain. Data from six subjects at 3 T are reported. The method has been extended to half k-space acquisition, which not only provides additional acceleration, but also facilitates slice separation because of increased signal intensity of the central lines of k-space coupled with reduced susceptibility effects. PMID:22432957

  15. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect

    Gray, Genetha A.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  16. Parallel time integration software

    SciTech Connect

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.

  17. Parallel time integration software

    Energy Science and Technology Software Center (ESTSC)

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore » come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less

  18. Fus3p and Kss1p control G1 arrest in Saccharomyces cerevisiae through a balance of distinct arrest and proliferative functions that operate in parallel with Far1p.

    PubMed Central

    Cherkasova, V; Lyons, D M; Elion, E A

    1999-01-01

    In Saccharomyces cerevisiae, mating pheromones activate two MAP kinases (MAPKs), Fus3p and Kss1p, to induce G1 arrest prior to mating. Fus3p is known to promote G1 arrest by activating Far1p, which inhibits three Clnp/Cdc28p kinases. To analyze the contribution of Fus3p and Kss1p to G1 arrest that is independent of Far1p, we constructed far1 CLN strains that undergo G1 arrest from increased activation of the mating MAP kinase pathway. We find that Fus3p and Kss1p both control G1 arrest through multiple functions that operate in parallel with Far1p. Fus3p and Kss1p together promote G1 arrest by repressing transcription of G1/S cyclin genes (CLN1, CLN2, CLB5) by a mechanism that blocks their activation by Cln3p/Cdc28p kinase. In addition, Fus3p and Kss1p counteract G1 arrest through overlapping and distinct functions. Fus3p and Kss1p together increase the expression of CLN3 and PCL2 genes that promote budding, and Kss1p inhibits the MAP kinase cascade. Strikingly, Fus3p promotes proliferation by a novel function that is not linked to reduced Ste12p activity or increased levels of Cln2p/Cdc28p kinase. Genetic analysis suggests that Fus3p promotes proliferation through activation of Mcm1p transcription factor that upregulates numerous genes in G1 phase. Thus, Fus3p and Kss1p control G1 arrest through a balance of arrest functions that inhibit the Cdc28p machinery and proliferative functions that bypass this inhibition. PMID:10049917

  19. Parallel optical sampler

    SciTech Connect

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  20. Coarrars for Parallel Processing

    NASA Technical Reports Server (NTRS)

    Snyder, W. Van

    2011-01-01

    The design of the Coarray feature of Fortran 2008 was guided by answering the question "What is the smallest change required to convert Fortran to a robust and efficient parallel language." Two fundamental issues that any parallel programming model must address are work distribution and data distribution. In order to coordinate work distribution and data distribution, methods for communication and synchronization must be provided. Although originally designed for Fortran, the Coarray paradigm has stimulated development in other languages. X10, Chapel, UPC, Titanium, and class libraries being developed for C++ have the same conceptual framework.

  1. Speeding up parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.

  2. Programming parallel vision algorithms

    SciTech Connect

    Shapiro, L.G.

    1988-01-01

    Computer vision requires the processing of large volumes of data and requires parallel architectures and algorithms to be useful in real-time, industrial applications. The INSIGHT dataflow language was designed to allow encoding of vision algorithms at all levels of the computer vision paradigm. INSIGHT programs, which are relational in nature, can be translated into a graph structure that represents an architecture for solving a particular vision problem or a configuration of a reconfigurable computational network. The authors consider here INSIGHT programs that produce a parallel net architecture for solving low-, mid-, and high-level vision tasks.

  3. The NAS Parallel Benchmarks

    SciTech Connect

    Bailey, David H.

    2009-11-15

    The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage

  4. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  5. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.

  6. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  7. Solution-phase parallel synthesis of a pharmacophore library of HUN-7293 analogues: a general chemical mutagenesis approach to defining structure-function properties of naturally occurring cyclic (depsi)peptides.

    PubMed

    Chen, Yan; Bilban, Melitta; Foster, Carolyn A; Boger, Dale L

    2002-05-15

    HUN-7293 (1), a naturally occurring cyclic heptadepsipeptide, is a potent inhibitor of cell adhesion molecule expression (VCAM-1, ICAM-1, E-selectin), the overexpression of which is characteristic of chronic inflammatory diseases. Representative of a general approach to defining structure-function relationships of such cyclic (depsi)peptides, the parallel synthesis and evaluation of a complete library of key HUN-7293 analogues are detailed enlisting solution-phase techniques and simple acid-base liquid-liquid extractions for isolation and purification of intermediates and final products. Significant to the design of the studies and unique to solution-phase techniques, the library was assembled superimposing a divergent synthetic strategy onto a convergent total synthesis. An alanine scan and N-methyl deletion of each residue of the cyclic heptadepsipeptide identified key sites responsible for or contributing to the biological properties. The simultaneous preparation of a complete set of individual residue analogues further simplifying the structure allowed an assessment of each structural feature of 1, providing a detailed account of the structure-function relationships in a single study. Within this pharmacophore library prepared by systematic chemical mutagenesis of the natural product structure, simplified analogues possessing comparable potency and, in some instances, improved selectivity were identified. One potent member of this library proved to be an additional natural product in its own right, which we have come to refer to as HUN-7293B (8), being isolated from the microbial strain F/94-499709. PMID:11996584

  8. Parallel fast gauss transform

    SciTech Connect

    Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan

    2010-01-01

    We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.

  9. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  10. Parallel Multigrid Equation Solver

    Energy Science and Technology Software Center (ESTSC)

    2001-09-07

    Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.

  11. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (ESTSC)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  12. NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Subhash, Saini; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

    1995-01-01

    The NAS Parallel Benchmarks (NPB) were developed in 1991 at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a pencil and paper fashion i.e. the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. In this paper, we present new NPB performance results for the following systems: (a) Parallel-Vector Processors: Cray C90, Cray T'90 and Fujitsu VPP500; (b) Highly Parallel Processors: Cray T3D, IBM SP2 and IBM SP-TN2 (Thin Nodes 2); (c) Symmetric Multiprocessing Processors: Convex Exemplar SPP1000, Cray J90, DEC Alpha Server 8400 5/300, and SGI Power Challenge XL. We also present sustained performance per dollar for Class B LU, SP and BT benchmarks. We also mention NAS future plans of NPB.

  13. High performance parallel architectures

    SciTech Connect

    Anderson, R.E. )

    1989-09-01

    In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.

  14. Parallel hierarchical global illumination

    SciTech Connect

    Snell, Q.O.

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  15. Optical parallel selectionist systems

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John

    1993-01-01

    There are at least two major classes of computers in nature and technology: connectionist and selectionist. A subset of connectionist systems (Turing Machines) dominates modern computing, although another subset (Neural Networks) is growing rapidly. Selectionist machines have unique capabilities which should allow them to do truly creative operations. It is possible to make a parallel optical selectionist system using methods describes in this paper.

  16. Parallel hierarchical radiosity rendering

    SciTech Connect

    Carter, M.

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  17. LEWICE droplet trajectory calculations on a parallel computer

    NASA Technical Reports Server (NTRS)

    Caruso, Steven C.

    1993-01-01

    A parallel computer implementation (128 processors) of LEWICE, a NASA Lewis code used to predict the time-dependent ice accretion process for two-dimensional aerodynamic bodies of simple geometries, is described. Two-dimensional parallel droplet trajectory calculations are performed to demonstrate the potential benefits of applying parallel processing to ice accretion analysis. Parallel performance is evaluated as a function of the number of trajectories and the number of processors. For comparison, similar trajectory calculations are performed on single-processor Cray computers, and the best parallel results are found to be 33 and 23 times faster, respectively, than those of the Cray XMP and YMP.

  18. PARALLEL ASSAY OF OXYGEN EQUILIBRIA OF HEMOGLOBIN

    PubMed Central

    Lilly, Laura E.; Blinebry, Sara K.; Viscardi, Chelsea M.; Perez, Luis; Bonaventura, Joe; McMahon, Tim J.

    2013-01-01

    Methods to systematically analyze in parallel the function of multiple protein or cell samples in vivo or ex vivo (i.e. functional proteomics) in a controlled gaseous environment have thus far been limited. Here we describe an apparatus and procedure that enables, for the first time, parallel assay of oxygen equilibria in multiple samples. Using this apparatus, numerous simultaneous oxygen equilibrium curves (OECs) can be obtained under truly identical conditions from blood cell samples or purified hemoglobins (Hbs). We suggest that the ability to obtain these parallel datasets under identical conditions can be of immense value, both to biomedical researchers and clinicians who wish to monitor blood health, and to physiologists studying non-human organisms and the effects of climate change on these organisms. Parallel monitoring techniques are essential in order to better understand the functions of critical cellular proteins. The procedure can be applied to human studies, wherein an OEC can be analyzed in light of an individual’s entire genome. Here, we analyzed intraerythrocytic Hb, a protein that operates at the organism’s environmental interface and then comes into close contact with virtually all of the organism’s cells. The apparatus is theoretically scalable, and establishes a functional proteomic screen that can be correlated with genomic information on the same individuals. This new method is expected to accelerate our general understanding of protein function, an increasingly challenging objective as advances in proteomic and genomic throughput outpace the ability to study proteins’ functional properties. PMID:23827235

  19. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  20. A highly parallel signal processor

    NASA Astrophysics Data System (ADS)

    Bigham, Jackson D., Jr.

    There is an increasing need for signal processors functional across a broad range of problems, from radar systems to E-O and ESM applications. To meet this challenge, a signal processing system capable of efficiently meeting the processing requirements over a broad range of avionics sensor systems has been developed. The CDC Parallel Modular Signal Processor (PMSP) is a complete MIL/E-5400-qualified digital signal processing system capable of computation rates greater than 600 MOPS (million operations per second). The signal processing element of the PMSP is the Micro-AFP. It is an all-VLSI processor capable of executing multiple simultaneous operations. Up to five Micro-AFPs and 12 MB of main store memory (MSM), along with associated control and I/O functions, are contained in the PMSP's standard ATR enclosure.

  1. Hybrid Optimization Parallel Search PACKage

    Energy Science and Technology Software Center (ESTSC)

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework providesmore » a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less

  2. Seeing in parallel

    SciTech Connect

    Little, J.J.; Poggio, T.; Gamble, E.B. Jr.

    1988-01-01

    Computer algorithms have been developed for early vision processes that give separate cues to the distance from the viewer of three-dimensional surfaces, their shape, and their material properties. The MIT Vision Machine is a computer system that integrates several early vision modules to achieve high-performance recognition and navigation in unstructured environments. It is also an experimental environment for theoretical progress in early vision algorithms, their parallel implementation, and their integration. The Vision Machine consists of a movable, two-camera Eye-Head input device and an 8K Connection Machine. The authors have developed and implemented several parallel early vision algorithms that compute edge detection, stereopsis, motion, texture, and surface color in close to real time. The integration stage, based on coupled Markov random field models, leads to a cartoon-like map of the discontinuities in the scene, with partial labeling of the brightness edges in terms of their physical origin.

  3. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  4. Parallel Anisotropic Tetrahedral Adaptation

    NASA Technical Reports Server (NTRS)

    Park, Michael A.; Darmofal, David L.

    2008-01-01

    An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.

  5. Homology, convergence and parallelism.

    PubMed

    Ghiselin, Michael T

    2016-01-01

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721

  6. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  7. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Gryphon, Coranth D.; Miller, Mark D.

    1991-01-01

    PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.

  8. Parallel multilevel preconditioners

    SciTech Connect

    Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.

    1989-01-01

    In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.

  9. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Painter, J.; Hansen, C.

    1996-10-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.

  10. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  11. ASSEMBLY OF PARALLEL PLATES

    DOEpatents

    Groh, E.F.; Lennox, D.H.

    1963-04-23

    This invention is concerned with a rigid assembly of parallel plates in which keyways are stamped out along the edges of the plates and a self-retaining key is inserted into aligned keyways. Spacers having similar keyways are included between adjacent plates. The entire assembly is locked into a rigid structure by fastening only the outermost plates to the ends of the keys. (AEC)

  12. Adaptive parallel logic networks

    SciTech Connect

    Martinez, T.R.; Vidal, J.J.

    1988-02-01

    This paper presents a novel class of special purpose processors referred to as ASOCS (adaptive self-organizing concurrent systems). Intended applications include adaptive logic devices, robotics, process control, system malfunction management, and in general, applications of logic reasoning. ASOCS combines massive parallelism with self-organization to attain a distributed mechanism for adaptation. The ASOCS approach is based on an adaptive network composed of many simple computing elements (nodes) which operate in a combinational and asynchronous fashion. Problem specification (programming) is obtained by presenting to the system if-then rules expressed as Boolean conjunctions. New rules are added incrementally. In the current model, when conflicts occur, precedence is given to the most recent inputs. With each rule, desired network response is simply presented to the system, following which the network adjusts itself to maintain consistency and parsimony of representation. Data processing and adaptation form two separate phases of operation. During processing, the network acts as a parallel hardware circuit. Control of the adaptive process is distributed among the network nodes and efficiently exploits parallelism.

  13. Trajectory optimization using parallel shooting method on parallel computer

    SciTech Connect

    Wirthman, D.J.; Park, S.Y.; Vadali, S.R.

    1995-03-01

    The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.

  14. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  15. Global Arrays Parallel Programming Toolkit

    SciTech Connect

    Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel

    2011-01-01

    The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving

  16. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  17. Resistor Combinations for Parallel Circuits.

    ERIC Educational Resources Information Center

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  18. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  19. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.

  20. Parallel Eclipse Project Checkout

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

    2011-01-01

    Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any

  1. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.

    1995-05-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.

  2. Fastpath Speculative Parallelization

    NASA Astrophysics Data System (ADS)

    Spear, Michael F.; Kelsey, Kirk; Bai, Tongxin; Dalessandro, Luke; Scott, Michael L.; Ding, Chen; Wu, Peng

    We describe Fastpath, a system for speculative parallelization of sequential programs on conventional multicore processors. Our system distinguishes between the lead thread, which executes at almost-native speed, and speculative threads, which execute somewhat slower. This allows us to achieve nontrivial speedup, even on two-core machines. We present a mathematical model of potential speedup, parameterized by application characteristics and implementation constants. We also present preliminary results gleaned from two different Fastpath implementations, each derived from an implementation of software transactional memory.

  3. Synchronous Parallel Kinetic Monte Carlo

    SciTech Connect

    Mart?nez, E; Marian, J; Kalos, M H

    2006-12-14

    A novel parallel kinetic Monte Carlo (kMC) algorithm formulated on the basis of perfect time synchronicity is presented. The algorithm provides an exact generalization of any standard serial kMC model and is trivially implemented in parallel architectures. We demonstrate the mathematical validity and parallel performance of the method by solving several well-understood problems in diffusion.

  4. CSM parallel structural methods research

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1989-01-01

    Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.

  5. Roo: A parallel theorem prover

    SciTech Connect

    Lusk, E.L.; McCune, W.W.; Slaney, J.K.

    1991-11-01

    We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.

  6. Programming parallel architectures - The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  7. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  8. Sequential and Parallel Algorithms for Spherical Interpolation

    NASA Astrophysics Data System (ADS)

    De Rossi, Alessandra

    2007-09-01

    Given a large set of scattered points on a sphere and their associated real values, we analyze sequential and parallel algorithms for the construction of a function defined on the sphere satisfying the interpolation conditions. The algorithms we implemented are based on a local interpolation method using spherical radial basis functions and the Inverse Distance Weighted method. Several numerical results show accuracy and efficiency of the algorithms.

  9. Imaging with parallel ray-rotation sheets.

    PubMed

    Hamilton, Alasdair C; Courtial, Johannes

    2008-12-01

    A ray-rotation sheet consists of miniaturized optical components that function--ray optically--as a homogeneous medium that rotates the local direction of transmitted light rays around the sheet normal by an arbitrary angle [A. C. Hamilton et al., arXiv:0809.2646 (2008)]. Here we show that two or more parallel ray-rotation sheets perform imaging between two planes. The image is unscaled and un-rotated. No other planes are imaged. When seen through parallel ray-rotation sheets, planes that are not imaged appear rotated. PMID:19065221

  10. Electron parallel closures for arbitrary collisionality

    SciTech Connect

    Ji, Jeong-Young Held, Eric D.

    2014-12-15

    Electron parallel closures for heat flow, viscosity, and friction force are expressed as kernel-weighted integrals of thermodynamic drives, the temperature gradient, relative electron-ion flow velocity, and flow-velocity gradient. Simple, fitted kernel functions are obtained for arbitrary collisionality from the 6400 moment solution and the asymptotic behavior in the collisionless limit. The fitted kernels circumvent having to solve higher order moment equations in order to close the electron fluid equations. For this reason, the electron parallel closures provide a useful and general tool for theoretical and computational models of astrophysical and laboratory plasmas.

  11. Constructing higher order DNA origami arrays using DNA junctions of anti-parallel/parallel double crossovers

    NASA Astrophysics Data System (ADS)

    Ma, Zhipeng; Park, Seongsu; Yamashita, Naoki; Kawai, Kentaro; Hirai, Yoshikazu; Tsuchiya, Toshiyuki; Tabata, Osamu

    2016-06-01

    DNA origami provides a versatile method for the construction of nanostructures with defined shape, size and other properties; such nanostructures may enable a hierarchical assembly of large scale architecture for the placement of other nanomaterials with atomic precision. However, the effective use of these higher order structures as functional components depends on knowledge of their assembly behavior and mechanical properties. This paper demonstrates construction of higher order DNA origami arrays with controlled orientations based on the formation of two types of DNA junctions: anti-parallel and parallel double crossovers. A two-step assembly process, in which preformed rectangular DNA origami monomer structures themselves undergo further self-assembly to form numerically unlimited arrays, was investigated to reveal the influences of assembly parameters. AFM observations showed that when parallel double crossover DNA junctions are used, the assembly of DNA origami arrays occurs with fewer monomers than for structures formed using anti-parallel double crossovers, given the same assembly parameters, indicating that the configuration of parallel double crossovers is not energetically preferred. However, the direct measurement by AFM force-controlled mapping shows that both DNA junctions of anti-parallel and parallel double crossovers have homogeneous mechanical stability with any part of DNA origami.

  12. Making parallel lines meet

    PubMed Central

    Baskin, Tobias I.; Gu, Ying

    2012-01-01

    The extracellular matrix is constructed beyond the plasma membrane, challenging mechanisms for its control by the cell. In plants, the cell wall is highly ordered, with cellulose microfibrils aligned coherently over a scale spanning hundreds of cells. To a considerable extent, deploying aligned microfibrils determines mechanical properties of the cell wall, including strength and compliance. Cellulose microfibrils have long been seen to be aligned in parallel with an array of microtubules in the cell cortex. How do these cortical microtubules affect the cellulose synthase complex? This question has stood for as many years as the parallelism between the elements has been observed, but now an answer is emerging. Here, we review recent work establishing that the link between microtubules and microfibrils is mediated by a protein named cellulose synthase-interacting protein 1 (CSI1). The protein binds both microtubules and components of the cellulose synthase complex. In the absence of CSI1, microfibrils are synthesized but their alignment becomes uncoupled from the microtubules, an effect that is phenocopied in the wild type by depolymerizing the microtubules. The characterization of CSI1 significantly enhances knowledge of how cellulose is aligned, a process that serves as a paradigmatic example of how cells dictate the construction of their extracellular environment. PMID:22902763

  13. Applied Parallel Metadata Indexing

    SciTech Connect

    Jacobi, Michael R

    2012-08-01

    The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

  14. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  15. Massively Parallel QCD

    SciTech Connect

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-04-11

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.

  16. Parallel ptychographic reconstruction

    PubMed Central

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-01-01

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174

  17. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  18. Loop parallelism on Tera MTA using SISAL

    SciTech Connect

    Mitrovic, S.

    1995-11-01

    The difficulty of programming parallel computers has impeded their wide-spread use. The problems are caused by existing hardware and software tools. The software problems on shared-memory and vector computers can be solved by using deterministic high-performance functional languages like SISAL. Distributed-memory computers have even more obstacles than shared-memory parallel machines. Research indicates that multithreaded architectures can hide long latency of distributed memories and that they can solve the problems of locality. Tera`s MTA multiprocessor is based on the concept of multithreading and provides the programmer with a real shared-memory model. This paper investigates the performance of parallel loops written in SISAL and executed on the Tera MTA using the Livermore Loops benchmarks.

  19. Parallel Algormiivls For Optical Digital Computers

    NASA Astrophysics Data System (ADS)

    Huang, Alan

    1983-04-01

    Conventional computers suffer from several communication bottlenecks which fundamentally limit their performance. These bottlenecks are characterized by an address-dependent sequential transfer of information which arises from the need to time-multiplex information over a limited number of interconnections. An optical digital computer based on a classical finite state machine can be shown to be free of these bottlenecks. Such a processor would be unique since it would be capable of modifying its entire state space each cycle while conventional computers can only alter a few bits. New algorithms are needed to manage and use this capability. A technique based on recognizing a particular symbol in parallel and replacing it in parallel with another symbol is suggested. Examples using this parallel symbolic substitution to perform binary addition and binary incrementation are presented. Applications involving Boolean logic, functional programming languages, production rule driven artificial intelligence, and molecular chemistry are also discussed.

  20. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  1. Grundy - Parallel processor architecture makes programming easy

    NASA Technical Reports Server (NTRS)

    Meier, R. J., Jr.

    1985-01-01

    The hardware, software, and firmware of the parallel processor, Grundy, are examined. The Grundy processor uses a simple processor that has a totally orthogonal three-address instruction set. The system contains a relative and indirect processing mode to support the high-level language, and uses pseudoprocessors and read-only memory. The system supports high-level language in which arbitrary degrees of algorithmic parallelism is expressed. The functions of the compiler and invocation frame are described. Grundy uses an operating system that can be accessed by an arbitrary number of processes simultaneously, and the access time grows only as the logarithm of the number of active processes. Applications for the parallel processor are discussed.

  2. Parallel Mechanisms for Visual Search in Zebrafish

    PubMed Central

    Proulx, Michael J.; Parker, Matthew O.; Tahir, Yasser; Brennan, Caroline H.

    2014-01-01

    Parallel visual search mechanisms have been reported previously only in mammals and birds, and not animals lacking an expanded telencephalon such as bees. Here we report the first evidence for parallel visual search in fish using a choice task where the fish had to find a target amongst an increasing number of distractors. Following two-choice discrimination training, zebrafish were presented with the original stimulus within an increasing array of distractor stimuli. We found that zebrafish exhibit no significant change in accuracy and approach latency as the number of distractors increased, providing evidence of parallel processing. This evidence challenges theories of vertebrate neural architecture and the importance of an expanded telencephalon for the evolution of executive function. PMID:25353168

  3. Extending HPF for advanced data parallel applications

    NASA Technical Reports Server (NTRS)

    Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

    1994-01-01

    The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.

  4. Parallel algorithms for optical digital computers

    SciTech Connect

    Huang, A.

    1983-01-01

    Conventional computers suffer from several communication bottlenecks which fundamentally limit their performance. These bottlenecks are characterised by an address-dependent sequential transfer of information which arises from the need to time-multiplex information over a limited number of interconnections. An optical digital computer based on a classical finite state machine can be shown to be free of these bottlenecks. Such a processor would be unique since it would be capable of modifying its entire state space each cycle while conventional computers can only alter a few bits. New algorithms are needed to manage and use this capability. A technique based on recognising a particular symbol in parallel and replacing it in parallel with another symbol is suggested. Examples using this parallel symbolic substitution to perform binary addition and binary incrementation are presented. Applications involving Boolean logic, functional programming languages, production rule driven artificial intelligence, and molecular chemistry are also discussed. 12 references.

  5. Detecting opportunities for parallel observations on the Hubble Space Telescope

    NASA Technical Reports Server (NTRS)

    Lucks, Michael

    1992-01-01

    The presence of multiple scientific instruments aboard the Hubble Space Telescope provides opportunities for parallel science, i.e., the simultaneous use of different instruments for different observations. Determining whether candidate observations are suitable for parallel execution depends on numerous criteria (some involving quantitative tradeoffs) that may change frequently. A knowledge based approach is presented for constructing a scoring function to rank candidate pairs of observations for parallel science. In the Parallel Observation Matching System (POMS), spacecraft knowledge and schedulers' preferences are represented using a uniform set of mappings, or knowledge functions. Assessment of parallel science opportunities is achieved via composition of the knowledge functions in a prescribed manner. The knowledge acquisition, and explanation facilities of the system are presented. The methodology is applicable to many other multiple criteria assessment problems.

  6. A systolic array parallelizing compiler

    SciTech Connect

    Tseng, P.S. )

    1990-01-01

    This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.

  7. Parallel Computing in SCALE

    SciTech Connect

    DeHart, Mark D; Williams, Mark L; Bowman, Stephen M

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  8. Unified Parallel Software

    Energy Science and Technology Software Center (ESTSC)

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use ofmore » EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.« less

  9. Unified Parallel Software

    SciTech Connect

    McKay, Mike

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use of EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.

  10. Parallel Polarization State Generation

    PubMed Central

    She, Alan; Capasso, Federico

    2016-01-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813

  11. Parallel tridiagonal equation solvers

    NASA Technical Reports Server (NTRS)

    Stone, H. S.

    1974-01-01

    Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.

  12. Parallel Polarization State Generation

    NASA Astrophysics Data System (ADS)

    She, Alan; Capasso, Federico

    2016-05-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  13. Parallel Polarization State Generation.

    PubMed

    She, Alan; Capasso, Federico

    2016-01-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813

  14. Parallel Imaging Microfluidic Cytometer

    PubMed Central

    Ehrlich, Daniel J.; McKenna, Brian K.; Evans, James G.; Belkina, Anna C.; Denis, Gerald V.; Sherr, David; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of flow cytometry (FACS) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1-D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity and, (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in approximately 6–10 minutes, about 30-times the speed of most current FACS systems. In 1-D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times the sample throughput of CCD-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. PMID:21704835

  15. The parallel I/O architecture of the high performance storage system (HPSS). Revision 1

    SciTech Connect

    Watson, R.W.; Coyne, R.A.

    1995-04-01

    Datasets up to terabyte size and petabyte capacities have created a serious imbalance between I/O and storage system performance and system functionality. One promising approach is the use of parallel data transfer techniques for client access to storage, peripheral-to-peripheral transfers, and remote file transfers. This paper describes the parallel I/O architecture and mechanisms, Parallel Transport Protocol (PTP), parallel FTP, and parallel client Application Programming Interface (API) used by the High Performance Storage System (HPSS). Parallel storage integration issues with a local parallel file system are also discussed.

  16. Parallelizing OVERFLOW: Experiences, Lessons, Results

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.

    1999-01-01

    The computer code OVERFLOW is widely used in the aerodynamic community for the numerical solution of the Navier-Stokes equations. Current trends in computer systems and architectures are toward multiple processors and parallelism, including distributed memory. This report describes work that has been carried out by the author and others at Ames Research Center with the goal of parallelizing OVERFLOW using a variety of parallel architectures and parallelization strategies. This paper begins with a brief description of the OVERFLOW code. This description includes the basic numerical algorithm and some software engineering considerations. Next comes a description of a parallel version of OVERFLOW, OVERFLOW/PVM, using PVM (Parallel Virtual Machine). This parallel version of OVERFLOW uses the manager/worker style and is part of the standard OVERFLOW distribution. Then comes a description of a parallel version of OVERFLOW, OVERFLOW/MPI, using MPI (Message Passing Interface). This parallel version of OVERFLOW uses the SPMD (Single Program Multiple Data) style. Finally comes a discussion of alternatives to explicit message-passing in the context of parallelizing OVERFLOW.

  17. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  18. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  19. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  20. PMESH: A parallel mesh generator

    SciTech Connect

    Hardin, D.D.

    1994-10-21

    The Parallel Mesh Generation (PMESH) Project is a joint LDRD effort by A Division and Engineering to develop a unique mesh generation system that can construct large calculational meshes (of up to 10{sup 9} elements) on massively parallel computers. Such a capability will remove a critical roadblock to unleashing the power of massively parallel processors (MPPs) for physical analysis. PMESH will support a variety of LLNL 3-D physics codes in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics.

  1. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  2. Parallel computation with the force

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1985-01-01

    A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.

  3. A Programmable Preprocessor for Parallelizing Fortran-90

    SciTech Connect

    Rosing, Matthew; Yabusaki, Steven B.

    1999-07-01

    A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department of Energy. The tool provides the basic functionality of a traditional preprocessor where directives are embedded in a serial Fortran program and interpreted by the preprocessor to produce parallel Fortran code with MPI calls. The unique aspect of this work is that the user can make additions to, or modify, these directives. The directives reside in a preprocessor library and changes to this library can range from small changes to customize an existing library, to larger changes for porting a library, to completely replacing the library. The preprocessor is programmed with a library of directives written in a C-like language, called DL, that has added support for manipulating Fortran code fragments. The primary benefits to the user are twofold: It is fairly easy for any user to generate efficient, parallel code from Fortran-90 with embedded directives, and the long term viability of the user?s software is guaranteed. This is because the source code will always run on a serial machine (the directives are transparent to standard Fortran compilers), and the preprocessor library can be modified to work with different hardware and software environments. A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures. Performance of these programs is similar to programs explicitly written for a parallel machine. Binaries of the preprocessor core, as well as the preprocessor library source code used in our groundwater modeling codes are currently available.

  4. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  5. Parallel Adaptive Mesh Refinement

    SciTech Connect

    Diachin, L; Hornung, R; Plassmann, P; WIssink, A

    2005-03-04

    As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the

  6. Parallel execution model for Prolog

    SciTech Connect

    Fagin, B.S.

    1987-01-01

    One candidate language for parallel symbolic computing is Prolog. Numerous ways for executing Prolog in parallel have been proposed, but current efforts suffer from several deficiencies. Many cannot support fundamental types of concurrency in Prolog. Other models are of purely theoretical interest, ignoring implementation costs. Detailed simulation studies of execution models are scare; at present little is known about the costs and benefits of executing Prolog in parallel. In this thesis, a new parallel execution model for Prolog is presented: the PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR-parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented, and compilation to the PPP abstract instruction set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.

  7. Parallelizing Monte Carlo with PMC

    SciTech Connect

    Rathkopf, J.A.; Jones, T.R.; Nessett, D.M.; Stanberry, L.C.

    1994-11-01

    PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described.

  8. Reordering computations for parallel execution

    NASA Technical Reports Server (NTRS)

    Adams, L.

    1985-01-01

    The computations are reordered in the SOR algorithm to maintain the same asymptotic rate of convergence as the rowwise ordering to obtain parallelism at different levels. A parallel program is written to illustrate these ideas and actual machines for implementation of this program are discussed.

  9. Hebbian learning in parallel and modular memories.

    PubMed

    Poon, C S; Shah, J V

    1998-02-01

    Many cognitive and sensorimotor functions in the brain involve parallel and modular memory subsystems that are adapted by activity-dependent Hebbian synaptic plasticity. This is in contrast to the multilayer perceptron model of supervised learning where sensory information is presumed to be integrated by a common pool of hidden units through backpropagation learning. Here we show that Hebbian learning in parallel and modular memories is more advantageous than backpropagation learning in lumped memories in two respects: it is computationally much more efficient and structurally much simpler to implement with biological neurons. Accordingly, we propose a more biologically relevant neural network model, called a tree-like perceptron, which is a simple modification of the multilayer perceptron model to account for the general neural architecture, neuronal specificity, and synaptic learning rule in the brain. The model features a parallel and modular architecture in which adaptation of the input-to-hidden connection follows either a Hebbian or anti-Hebbian rule depending on whether the hidden units are excitatory or inhibitory, respectively. The proposed parallel and modular architecture and implicit interplay between the types of synaptic plasticity and neuronal specificity are exhibited by some neocortical and cerebellar systems. PMID:9525034

  10. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.