Science.gov

Sample records for not4p functions parallel

  1. Massively parallel density functional calculations for thousands of atoms: KKRnano

    NASA Astrophysics Data System (ADS)

    Thiess, A.; Zeller, R.; Bolten, M.; Dederichs, P. H.; Blügel, S.

    2012-06-01

    Applications of existing precise electronic-structure methods based on density functional theory are typically limited to the treatment of about 1000 inequivalent atoms, which leaves unresolved many open questions in material science, e.g., on complex defects, interfaces, dislocations, and nanostructures. KKRnano is a new massively parallel linear scaling all-electron density functional algorithm in the framework of the Korringa-Kohn-Rostoker (KKR) Green's-function method. We conceptualized, developed, and optimized KKRnano for large-scale applications of many thousands of atoms without compromising on the precision of a full-potential all-electron method, i.e., it is a method without any shape approximation of the charge density or potential. A key element of the new method is the iterative solution of the sparse linear Dyson equation, which we parallelized atom by atom, across energy points in the complex plane and for each spin degree of freedom using the message passing interface standard, followed by a lower-level OpenMP parallelization. This hybrid four-level parallelization allows for an efficient use of up to 100000 processors on the latest generation of supercomputers. The iterative solution of the Dyson equation is significantly accelerated, employing preconditioning techniques making use of coarse-graining principles expressed in a block-circulant preconditioner. In this paper, we will describe the important elements of this new algorithm, focusing on the parallelization and preconditioning and showing scaling results for NiPd alloys up to 8192 atoms and 65536 processors. At the end, we present an order-N algorithm for large-scale simulations of metallic systems, making use of the nearsighted principle of the KKR Green's-function approach by introducing a truncation of the electron scattering to a local cluster of atoms, the size of which is determined by the requested accuracy. By exploiting this algorithm, we show linear scaling calculations of more

  2. Massively Parallel Functional Analysis of BRCA1 RING Domain Variants

    PubMed Central

    Starita, Lea M.; Young, David L.; Islam, Muhtadi; Kitzman, Jacob O.; Gullingsrud, Justin; Hause, Ronald J.; Fowler, Douglas M.; Parvin, Jeffrey D.; Shendure, Jay; Fields, Stanley

    2015-01-01

    Interpreting variants of uncertain significance (VUS) is a central challenge in medical genetics. One approach is to experimentally measure the functional consequences of VUS, but to date this approach has been post hoc and low throughput. Here we use massively parallel assays to measure the effects of nearly 2000 missense substitutions in the RING domain of BRCA1 on its E3 ubiquitin ligase activity and its binding to the BARD1 RING domain. From the resulting scores, we generate a model to predict the capacities of full-length BRCA1 variants to support homology-directed DNA repair, the essential role of BRCA1 in tumor suppression, and show that it outperforms widely used biological-effect prediction algorithms. We envision that massively parallel functional assays may facilitate the prospective interpretation of variants observed in clinical sequencing. PMID:25823446

  3. Administering truncated receive functions in a parallel messaging interface

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.

  4. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    SciTech Connect

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  5. A hybrid method for the parallel computation of Green's functions

    SciTech Connect

    Petersen, Dan Erik; Li Song; Stokbro, Kurt; Sorensen, Hans Henrik B.; Hansen, Per Christian; Skelboe, Stig; Darve, Eric

    2009-08-01

    Quantum transport models for nanodevices using the non-equilibrium Green's function method require the repeated calculation of the block tridiagonal part of the Green's and lesser Green's function matrices. This problem is related to the calculation of the inverse of a sparse matrix. Because of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only require computing a small number of entries of the inverse matrix. Then, we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size.

  6. Parallel functional programming in Sisal: Fictions, facts, and future

    SciTech Connect

    McGraw, J.R.

    1993-07-01

    This paper provides a status report on the progress of research and development on the functional language Sisal. This project focuses on providing a highly effective method of writing large scientific applications that can efficiently execute on a spectrum of different multiprocessors. The paper includes sections on the language definition, compilation strategies, and programming techniques intended for readers with little or no background with Sisal. The section on performance presents our most recent results on execution speed for shared-memory multiprocessors, our findings using Sisal to develop codes, and our experiences migrating the same source code to different machines. For large programs, the execution performance of Sisal (with minimal supporting advice from the programmer) usually exceeds that of the best available automatic, vector/parallel Fortran compilers. Our evidence also indicates that Sisal programs tend to be shorter in length, faster to write, and dearer to understand than equivalent algorithms in Fortran. The paper concludes with a substantial discussion of common criticisms of the language and our plans for addressing them. Most notably, efficient implementations for distributed memory machines are lacking; an issue we plan to remedy.

  7. Functional networks in parallel with cortical development associate with executive functions in children.

    PubMed

    Zhong, Jidan; Rifkin-Graboi, Anne; Ta, Anh Tuan; Yap, Kar Lai; Chuang, Kai-Hsiang; Meaney, Michael J; Qiu, Anqi

    2014-07-01

    Children begin performing similarly to adults on tasks requiring executive functions in late childhood, a transition that is probably due to neuroanatomical fine-tuning processes, including myelination and synaptic pruning. In parallel to such structural changes in neuroanatomical organization, development of functional organization may also be associated with cognitive behaviors in children. We examined 6- to 10-year-old children's cortical thickness, functional organization, and cognitive performance. We used structural magnetic resonance imaging (MRI) to identify areas with cortical thinning, resting-state fMRI to identify functional organization in parallel to cortical development, and working memory/response inhibition tasks to assess executive functioning. We found that neuroanatomical changes in the form of cortical thinning spread over bilateral frontal, parietal, and occipital regions. These regions were engaged in 3 functional networks: sensorimotor and auditory, executive control, and default mode network. Furthermore, we found that working memory and response inhibition only associated with regional functional connectivity, but not topological organization (i.e., local and global efficiency of information transfer) of these functional networks. Interestingly, functional connections associated with "bottom-up" as opposed to "top-down" processing were more clearly related to children's performance on working memory and response inhibition, implying an important role for brain systems involved in late childhood. PMID:23448875

  8. Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding.

    PubMed

    Keçeli, Murat; Zhang, Hong; Zapol, Peter; Dixon, David A; Wagner, Albert F

    2016-02-01

    The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ∼330,000 (∼5600) eigenvalues and eigenfunctions are obtained in ∼190 (∼5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPs is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. A parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.

  9. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  10. Efficient time-dependent density functional theory approximations for hybrid density functionals: Analytical gradients and parallelization

    NASA Astrophysics Data System (ADS)

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-01

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ˜26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ˜27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ˜24 on 30 processors. The

  11. Parallel computers

    SciTech Connect

    Treveaven, P.

    1989-01-01

    This book presents an introduction to object-oriented, functional, and logic parallel computing on which the fifth generation of computer systems will be based. Coverage includes concepts for parallel computing languages, a parallel object-oriented system (DOOM) and its language (POOL), an object-oriented multilevel VLSI simulator using POOL, and implementation of lazy functional languages on parallel architectures.

  12. GPU-based parallel group ICA for functional magnetic resonance data.

    PubMed

    Jing, Yanshan; Zeng, Weiming; Wang, Nizhuan; Ren, Tianlong; Shi, Yingchao; Yin, Jun; Xu, Qi

    2015-04-01

    The goal of our study is to develop a fast parallel implementation of group independent component analysis (ICA) for functional magnetic resonance imaging (fMRI) data using graphics processing units (GPU). Though ICA has become a standard method to identify brain functional connectivity of the fMRI data, it is computationally intensive, especially has a huge cost for the group data analysis. GPU with higher parallel computation power and lower cost are used for general purpose computing, which could contribute to fMRI data analysis significantly. In this study, a parallel group ICA (PGICA) on GPU, mainly consisting of GPU-based PCA using SVD and Infomax-ICA, is presented. In comparison to the serial group ICA, the proposed method demonstrated both significant speedup with 6-11 times and comparable accuracy of functional networks in our experiments. This proposed method is expected to perform the real-time post-processing for fMRI data analysis.

  13. Method, systems, and computer program products for implementing function-parallel network firewall

    DOEpatents

    Fulp, Errin W.; Farley, Ryan J.

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  14. Parallel sites implicate functional convergence of the hearing gene prestin among echolocating mammals.

    PubMed

    Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng

    2014-09-01

    Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation. PMID:24951728

  15. Parallel sites implicate functional convergence of the hearing gene prestin among echolocating mammals.

    PubMed

    Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng

    2014-09-01

    Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation.

  16. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    Charon is a software toolkit that enables engineers to develop high-performing message-passing programs in a convenient and piecemeal fashion. Emphasis is on rapid program development and prototyping. In this report a detailed description of the functional design of the toolkit is presented. It is illustrated by the stepwise parallelization of two representative code examples.

  17. Analysis and selection of optimal function implementations in massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  18. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  19. Parallel distributed processing and neural networks: origins, methodology and cognitive functions.

    PubMed

    Parks, R W; Long, D L; Levine, D S; Crockett, D J; McGeer, E G; McGeer, P L; Dalton, I E; Zec, R F; Becker, R E; Coburn, K L

    1991-10-01

    Parallel Distributed Processing (PDP), a computational methodology with origins in Associationism, is used to provide empirical information regarding neurobiological systems. Recently, supercomputers have enabled neuroscientists to model brain behavior-relationships. An overview of supercomputer architecture demonstrates the advantages of parallel over serial processing. Histological data provide physical evidence of the parallel distributed nature of certain aspects of the human brain, as do corresponding computer simulations. Whereas sensory networks follow more sequential neural network pathways, in vivo brain imaging studies of attention and rudimentary language tasks appear to involve multiple cortical and subcortical areas. Controversy remains as to whether associative models or Artificial Intelligence symbolic models better reflect neural networks of cognitive functions; however, considerable interest has shifted towards associative models.

  20. Storing files in a parallel computing system based on user-specified parser function

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  1. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    PubMed

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. PMID:22562950

  2. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    NASA Astrophysics Data System (ADS)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  3. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    PubMed

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  4. Structure-Function Relationships of Postnatal Tendon Development: A Parallel to Healing

    PubMed Central

    Connizzo, Brianne K.; Yannascoli, Sarah M.; Soslowsky, Louis J.

    2013-01-01

    This review highlights recent research on structure-function relationships in tendon and comments on the parallels between development and healing. The processes of tendon development and collagen fibrillogenesis are reviewed, but due to the abundance of information in this field, this work focuses primarily on characterizing the mechanical behavior of mature and developing tendon, and how the latter parallels healing tendon. The role that extracellular matrix components, mainly collagen, proteoglycans, and collagen cross-links, play in determining the mechanical behavior of tendon will be examined in this review. Specifically, collagen fiber re-alignment and collagen fibril uncrimping relate mechanical behavior to structural alterations during development and during healing. Finally, attention is paid to a number of recent efforts to augment injured tendon and how future efforts could focus on recreating the important structure-function relationships reviewed here. PMID:23357642

  5. Optimization of a parallel permutation testing function for the SPRINT R package.

    PubMed

    Petrou, Savvas; Sloan, Terence M; Mewissen, Muriel; Forster, Thorsten; Piotrowski, Michal; Dobrzelecki, Bartosz; Ghazal, Peter; Trew, Arthur; Hill, Jon

    2011-12-10

    The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright © 2011 John Wiley & Sons, Ltd. PMID:23335858

  6. A cost-effective methodology for the design of massively-parallel VLSI functional units

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  7. Parallel fixed point implementation of a radial basis function network in an FPGA.

    PubMed

    de Souza, Alisson C D; Fernandes, Marcelo A C

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.

  8. Parallel Fixed Point Implementation of a Radial Basis Function Network in an FPGA

    PubMed Central

    de Souza, Alisson C. D.; Fernandes, Marcelo A. C.

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. PMID:25268918

  9. Locating and computing in parallel all the simple roots of special functions using PVM

    NASA Astrophysics Data System (ADS)

    Plagianakos, V. P.; Nousis, N. K.; Vrahatis, M. N.

    2001-08-01

    An algorithm is proposed for locating and computing in parallel and with certainty all the simple roots of any twice continuously differentiable function in any specific interval. To compute with certainty all the roots, the proposed method is heavily based on the knowledge of the total number of roots within the given interval. To obtain this information we use results from topological degree theory and, in particular, the Kronecker-Picard approach. This theory gives a formula for the computation of the total number of roots of a system of equations within a given region, which can be computed in parallel. With this tool in hand, we construct a parallel procedure for the localization and isolation of all the roots by dividing the given region successively and applying the above formula to these subregions until the final domains contain at the most one root. The subregions with no roots are discarded, while for the rest a modification of the well-known bisection method is employed for the computation of the contained root. The new aspect of the present contribution is that the computation of the total number of zeros using the Kronecker-Picard integral as well as the localization and computation of all the roots is performed in parallel using the parallel virtual machine (PVM). PVM is an integrated set of software tools and libraries that emulates a general-purpose, flexible, heterogeneous concurrent computing framework on interconnected computers of varied architectures. The proposed algorithm has large granularity and low synchronization, and is robust. It has been implemented and tested and our experience is that it can massively compute with certainty all the roots in a certain interval. Performance information from massive computations related to a recently proposed conjecture due to Elbert (this issue, J. Comput. Appl. Math. 133 (2001) 65-83) is reported.

  10. Electromagnetic semitransparent δ-function plate: Casimir interaction energy between parallel infinitesimally thin plates

    NASA Astrophysics Data System (ADS)

    Parashar, Prachi; Milton, Kimball A.; Shajesh, K. V.; Schaden, M.

    2012-10-01

    We derive boundary conditions for electromagnetic fields on a δ-function plate. The optical properties of such a plate are shown to necessarily be anisotropic in that they only depend on the transverse properties of the plate. We unambiguously obtain the boundary conditions for a perfectly conducting δ-function plate in the limit of infinite dielectric response. We show that a material does not “optically vanish” in the thin-plate limit. The thin-plate limit of a plasma slab of thickness d with plasma frequency ωp2=ζp/d reduces to a δ-function plate for frequencies (ω=iζ) satisfying ζd≪ζpd≪1. We show that the Casimir interaction energy between two parallel perfectly conducting δ-function plates is the same as that for parallel perfectly conducting slabs. Similarly, we show that the interaction energy between an atom and a perfect electrically conducting δ-function plate is the usual Casimir-Polder energy, which is verified by considering the thin-plate limit of dielectric slabs. The “thick” and “thin” boundary conditions considered by Bordag are found to be identical in the sense that they lead to the same electromagnetic fields.

  11. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  12. Finding zeros of nonlinear functions using the hybrid parallel cell mapping method

    NASA Astrophysics Data System (ADS)

    Xiong, Fu-Rui; Schütze, Oliver; Ding, Qian; Sun, Jian-Qiao

    2016-05-01

    Analysis of nonlinear dynamical systems including finding equilibrium states and stability boundaries often leads to a problem of finding zeros of vector functions. However, finding all the zeros of a set of vector functions in the domain of interest is quite a challenging task. This paper proposes a zero finding algorithm that combines the cell mapping methods and the subdivision techniques. Both the simple cell mapping (SCM) and generalized cell mapping (GCM) methods are used to identify a covering set of zeros. The subdivision technique is applied to enhance the solution resolution. The parallel implementation of the proposed method is discussed extensively. Several examples are presented to demonstrate the application and effectiveness of the proposed method. We then extend the study of finding zeros to the problem of finding stability boundaries of potential fields. Examples of two and three dimensional potential fields are studied. In addition to the effectiveness in finding the stability boundaries, the proposed method can handle several millions of cells in just a few seconds with the help of parallel computing in graphics processing units (GPUs).

  13. Line-field parallel swept source MHz OCT for structural and functional retinal imaging

    PubMed Central

    Fechtig, Daniel J.; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M.; Drexler, Wolfgang; Leitgeb, Rainer A.

    2015-01-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation. PMID:25798298

  14. The Functional Curli Amyloid Is Not Based on In-register Parallel β-Sheet Structure*

    PubMed Central

    Shewmaker, Frank; McGlinchey, Ryan P.; Thurber, Kent R.; McPhie, Peter; Dyda, Fred; Tycko, Robert; Wickner, Reed B.

    2009-01-01

    The extracellular curli proteins of Enterobacteriaceae form fibrous structures that are involved in biofilm formation and adhesion to host cells. These curli fibrils are considered a functional amyloid because they are not a consequence of misfolding, but they have many of the properties of protein amyloid. We confirm that fibrils formed by CsgA and CsgB, the primary curli proteins of Escherichia coli, possess many of the hallmarks typical of amyloid. Moreover we demonstrate that curli fibrils possess the cross-β structure that distinguishes protein amyloid. However, solid state NMR experiments indicate that curli structure is not based on an in-register parallel β-sheet architecture, which is common to many human disease-associated amyloids and the yeast prion amyloids. Solid state NMR and electron microscopy data are consistent with a β-helix-like structure but are not sufficient to establish such a structure definitively. PMID:19574225

  15. Leukocytosis and natural killer cell function parallel neurobehavioral fatigue induced by 64 hours of sleep deprivation.

    PubMed Central

    Dinges, D F; Douglas, S D; Zaugg, L; Campbell, D E; McMann, J M; Whitehouse, W G; Orne, E C; Kapoor, S C; Icaza, E; Orne, M T

    1994-01-01

    The hypothesis that sleep deprivation depresses immune function was tested in 20 adults, selected on the basis of their normal blood chemistry, monitored in a laboratory for 7 d, and kept awake for 64 h. At 2200 h each day measurements were taken of total leukocytes (WBC), monocytes, granulocytes, lymphocytes, eosinophils, erythrocytes (RBC), B and T lymphocyte subsets, activated T cells, and natural killer (NK) subpopulations (CD56/CD8 dual-positive cells, CD16-positive cells, CD57-positive cells). Functional tests included NK cytotoxicity, lymphocyte stimulation with mitogens, and DNA analysis of cell cycle. Sleep loss was associated with leukocytosis and increased NK cell activity. At the maximum sleep deprivation, increases were observed in counts of WBC, granulocytes, monocytes, NK activity, and the proportion of lymphocytes in the S phase of the cell cycle. Changes in monocyte counts correlated with changes in other immune parameters. Counts of CD4, CD16, CD56, and CD57 lymphocytes declined after one night without sleep, whereas CD56 and CD57 counts increased after two nights. No changes were observed in other lymphocyte counts, in proliferative responses to mitogens, or in plasma levels of cortisol or adrenocorticotropin hormone. The physiologic leukocytosis and NK activity increases during deprivation were eliminated by recovery sleep in a manner parallel to neurobehavioral function, suggesting that the immune alterations may be associated with biological pressure for sleep. PMID:7910171

  16. Depth estimation via parallel coevolution of disparity functions for area-based stereo

    NASA Astrophysics Data System (ADS)

    Liatsis, Panos; Goulermas, John Y.

    2001-02-01

    12 A novel system for depth estimation is proposed with the use of Symbiotic Genetic Algorithms for the continuous problem of disparity surface approximation. The approach is based on the decomposition of the entire surface to very small non- overlapping patches described by low order bivariate polynomials and the use of symbiotic optimization to enforce smoothness at the boundaries of these patches, so that the entire surface can be approximated in a smooth piecewise fashion by functionals of local support. Such optimization is amenable to a massive parallel implementation, since each patch is optimized by a different execution unit and each unit communicates through its cost function only with its four-connected neighbors. The method makes use of various existing crossover and mutation schemes for real-valued chromosome representations and a new problem-specific mechanism for generating and hybridizing the initial populations. The proposed multi-objective cost function enforces photometric similarity and smoothness between the patch boundaries at a local scale, which in the long term give rise to a globally smooth disparity surface.

  17. Parallel Loss-of-Function at the RPM1 Bacterial Resistance Locus in Arabidopsis thaliana

    PubMed Central

    Rose, Laura; Atwell, Susanna; Grant, Murray; Holub, Eric B.

    2012-01-01

    Dimorphism at the Resistance to Pseudomonas syringae pv. maculicola 1 (RPM1) locus is well documented in natural populations of Arabidopsis thaliana and has been portrayed as a long-term balanced polymorphism. The haplotype from resistant plants contains the RPM1 gene, which enables these plants to recognize at least two structurally unrelated bacterial effector proteins (AvrB and AvrRpm1) from bacterial crop pathogens. A complete deletion of the RPM1 coding sequence has been interpreted as a single event resulting in susceptibility in these individuals. Consequently, the ability to revert to resistance or for alternative R-gene specificities to evolve at this locus has also been lost in these individuals. Our survey of variation at the RPM1 locus in a large species-wide sample of A. thaliana has revealed four new loss-of-function alleles that contain most of the intervening sequence of the RPM1 open reading frame. Multiple loss-of-function alleles may have originated due to the reported intrinsic cost to plants expressing the RPM1 protein. The frequency and geographic distribution of rpm1 alleles observed in our survey indicate the parallel origin and maintenance of these loss-of-function mutations and reveal a more complex history of natural selection at this locus than previously thought. PMID:23272006

  18. Parallel Functional Changes in Independent Testis-Specific Duplicates of Aldehyde dehydrogenase in Drosophila

    PubMed Central

    Chakraborty, Mahul; Fry, James D.

    2015-01-01

    A large proportion of duplicates, originating from ubiquitously expressed genes, acquire testis-biased expression. Identifying the underlying cause of this observation requires determining whether the duplicates have altered functions relative to the parental genes. Typically, statistical methods are used to test for positive selection, signature of which in protein sequence of duplicates implies functional divergence. When assumptions are violated, however, such tests can lead to false inference of positive selection. More convincing evidence for naturally selected functional changes would be the occurrence of structural changes with similar functional consequences in independent duplicates of the same gene. We investigated two testis-specific duplicates of the broadly expressed enzyme gene Aldehyde dehydrogenase (Aldh) that arose in different Drosophila lineages. The duplicates show a typical pattern of accelerated amino acid substitutions relative to their broadly expressed paralogs, with statistical evidence for positive selection in both cases. Importantly, in both duplicates, width of the entrance to the substrate binding site, known a priori to influence substrate specificity, and otherwise conserved throughout the genus Drosophila, has been reduced, resulting in narrowing of the entrance. Protein structure modeling suggests that the reduction of the size of the enzyme’s substrate entry channel, which is likely to shift substrate specificity toward smaller aldehydes, is accounted for by the positively selected parallel substitutions in one duplicate but not the other. Evolution of the testis-specific duplicates was accompanied by reduction in expression of the ancestral Aldh in males, supporting the hypothesis that the duplicates may have helped resolve intralocus sexual conflict over Aldh function. PMID:25564519

  19. Parallel functional changes in independent testis-specific duplicates of Aldehyde dehydrogenase in Drosophila.

    PubMed

    Chakraborty, Mahul; Fry, James D

    2015-04-01

    A large proportion of duplicates, originating from ubiquitously expressed genes, acquire testis-biased expression. Identifying the underlying cause of this observation requires determining whether the duplicates have altered functions relative to the parental genes. Typically, statistical methods are used to test for positive selection, signature of which in protein sequence of duplicates implies functional divergence. When assumptions are violated, however, such tests can lead to false inference of positive selection. More convincing evidence for naturally selected functional changes would be the occurrence of structural changes with similar functional consequences in independent duplicates of the same gene. We investigated two testis-specific duplicates of the broadly expressed enzyme gene Aldehyde dehydrogenase (Aldh) that arose in different Drosophila lineages. The duplicates show a typical pattern of accelerated amino acid substitutions relative to their broadly expressed paralogs, with statistical evidence for positive selection in both cases. Importantly, in both duplicates, width of the entrance to the substrate binding site, known a priori to influence substrate specificity, and otherwise conserved throughout the genus Drosophila, has been reduced, resulting in narrowing of the entrance. Protein structure modeling suggests that the reduction of the size of the enzyme's substrate entry channel, which is likely to shift substrate specificity toward smaller aldehydes, is accounted for by the positively selected parallel substitutions in one duplicate but not the other. Evolution of the testis-specific duplicates was accompanied by reduction in expression of the ancestral Aldh in males, supporting the hypothesis that the duplicates may have helped resolve intralocus sexual conflict over Aldh function.

  20. Saccharomyces cerevisiae MPT5 and SSD1 function in parallel pathways to promote cell wall integrity.

    PubMed Central

    Kaeberlein, Matt; Guarente, Leonard

    2002-01-01

    Yeast MPT5 (UTH4) is a limiting component for longevity. We show here that MPT5 also functions to promote cell wall integrity. Loss of Mpt5p results in phenotypes associated with a weakened cell wall, including sorbitol-remedial temperature sensitivity and sensitivities to calcofluor white and sodium dodecyl sulfate. Additionally, we find that mutation of MPT5, in the absence of SSD1-V, is lethal in combination with loss of either Ccr4p or Swi4p. These synthetic lethal interactions are suppressed by the SSD1-V allele. Furthermore, we have provided evidence that the short life span caused by loss of Mpt5p is due to a weakened cell wall. This cell wall defect may be the result of abnormal chitin biosynthesis or accumulation. These analyses have defined three genetic pathways that function in parallel to promote cell integrity: an Mpt5p-containing pathway, an Ssd1p-containing pathway, and a Pkc1p-dependent pathway. This work also provides evidence that post-transcriptional regulation is likely to be important both for maintaining cell integrity and for promoting longevity. PMID:11805047

  1. Functionalized Polymers-Emerging Versatile Tools for Solution-Phase Chemistry and Automated Parallel Synthesis.

    PubMed

    Kirschning, Andreas; Monenschein, Holger; Wittenberg, Rüdiger

    2001-02-16

    As part of the dramatic changes associated with the need for preparing compound libraries in pharmaceutical and agrochemical research laboratories, industry searches for new technologies that allow for the automation of synthetic processes. Since the pioneering work by Merrifield polymeric supports have been identified to play a key role in this field however, polymer-assisted solution-phase synthesis which utilizes immobilized reagents and catalysts has only recently begun to flourish. Polymer-assisted solution-phase synthesis has various advantages over conventional solution-phase chemistry, such as the ease of separation of the supported species from a reaction mixture by filtration and washing, the opportunity to use an excess of the reagent to force the reaction to completion without causing workup problems, and the adaptability to continuous-flow processes. Various strategies for employing functionalized polymers stoichiometrically have been developed. Apart from reagents that are covalently or ionically attached to the polymeric backbone and which are released into solution in the presence of a suitable substrate, scavenger reagents play an increasingly important role in purifying reaction mixtures. Employing functionalized polymers in solution-phase synthesis has been shown to be extremely useful in automated parallel synthesis and multistep sequences. So far, compound libraries containing as many as 88 members have been generated by using several polymer-bound reagents one after another. Furthermore, it has been demonstrated that complex natural products like the alkaloids (+/-)-oxomaritidine and (+/-)-epimaritidine can be prepared by a sequence of five and six consecutive polymer-assisted steps, respectively, and the potent analgesic compound (+/-)-epibatidine in twelve linear steps ten of which are based on functionalized polymers. These developments reveal the great future prospects of polymer-assisted solution-phase synthesis.

  2. Parallel-META 2.0: Enhanced Metagenomic Data Analysis with Functional Annotation, High Performance Computing and Advanced Visualization

    PubMed Central

    Song, Baoxing; Xu, Jian; Ning, Kang

    2014-01-01

    The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples and the data size for each sample are increasing rapidly. Current metagenomic analysis is both data- and computation- intensive, especially when there are many species in a metagenomic sample, and each has a large number of sequences. As such, metagenomic analyses require extensive computational power. The increasing analytical requirements further augment the challenges for computation analysis. In this work, we have proposed Parallel-META 2.0, a metagenomic analysis software package, to cope with such needs for efficient and fast analyses of taxonomical and functional structures for microbial communities. Parallel-META 2.0 is an extended and improved version of Parallel-META 1.0, which enhances the taxonomical analysis using multiple databases, improves computation efficiency by optimized parallel computing, and supports interactive visualization of results in multiple views. Furthermore, it enables functional analysis for metagenomic samples including short-reads assembly, gene prediction and functional annotation. Therefore, it could provide accurate taxonomical and functional analyses of the metagenomic samples in high-throughput manner and on large scale. PMID:24595159

  3. Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization.

    PubMed

    Su, Xiaoquan; Pan, Weihua; Song, Baoxing; Xu, Jian; Ning, Kang

    2014-01-01

    The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples and the data size for each sample are increasing rapidly. Current metagenomic analysis is both data- and computation- intensive, especially when there are many species in a metagenomic sample, and each has a large number of sequences. As such, metagenomic analyses require extensive computational power. The increasing analytical requirements further augment the challenges for computation analysis. In this work, we have proposed Parallel-META 2.0, a metagenomic analysis software package, to cope with such needs for efficient and fast analyses of taxonomical and functional structures for microbial communities. Parallel-META 2.0 is an extended and improved version of Parallel-META 1.0, which enhances the taxonomical analysis using multiple databases, improves computation efficiency by optimized parallel computing, and supports interactive visualization of results in multiple views. Furthermore, it enables functional analysis for metagenomic samples including short-reads assembly, gene prediction and functional annotation. Therefore, it could provide accurate taxonomical and functional analyses of the metagenomic samples in high-throughput manner and on large scale.

  4. Construction and functional characterization of double and triple mutants of parallel beta-bulge of ubiquitin.

    PubMed

    Sharma, Mrinal; Prabha, C Ratna

    2011-12-01

    Ubiquitin, a small eukaryotic protein serving as a post-translational modification on many important proteins, plays central role in cellular homeostasis and cell cycle regulation. Ubiquitin features two beta-bulges, the second beta-bulge, located at the C-terminal region of the protein along with type II turn, holds 3 residues Glu64(1), Ser65(2) and Gln2(X). Percent frequency of occurrence of such a sequence in parallel beta-bulge is very low. However, the sequence and structure have been conserved in ubiquitin through out the evolution. Present study involves replacement of residues in unusual beta-bulge of ubiquitin by introducing mutations in combination through site directed mutagenesis, generating double and triple mutants and their functional characterization. Mutant ubiquitins cloned in yeast expression vector YEp96 tested for growth profile, viability assay and heat stress complementation study have revealed significant decrease in growth rate, loss of viability and non-complementation of heat sensitive phenotype with UbE64G-S65D and UbQ2N-E64G-S65D mutations. However, UbQ2N-S65D did not show any negative effects in the above assays. Present results show that, replacement of residues in beta-bulge of ubiquitin exerts severe effects on growth and viability in Saccharomyces cerevisiae due to functional failure of the mutant ubiquitins UbE64G-S65D and UbQ2N-E64G-S65D.

  5. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    PubMed

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E; Muñoz-Fuentes, Violeta; Green, Andy J; Kopuchian, Cecilia; Tubaro, Pablo L; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M; Wilson, Robert E; Fago, Angela; McCracken, Kevin G; Storz, Jay F

    2015-12-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  6. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level

    PubMed Central

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E.; Muñoz-Fuentes, Violeta; Green, Andy J.; Kopuchian, Cecilia; Tubaro, Pablo L.; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M.; Wilson, Robert E.; Fago, Angela; McCracken, Kevin G.; Storz, Jay F.

    2015-01-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level. PMID:26637114

  7. Energy distribution functions of kilovolt ions parallel and perpendicular to the magnetic field of a modified Penning discharge

    NASA Technical Reports Server (NTRS)

    Roth, R. J.

    1973-01-01

    The distribution function of ion energy parallel to the magnetic field of a modified Penning discharge has been measured with a retarding potential energy analyzer. These ions escaped through one of the throats of the magnetic mirror geometry. Simultaneous measurements of the ion energy distribution function perpendicular to the magnetic field have been made with a charge exchange neutral detector. The ion energy distribution functions are approximately Maxwellian, and the parallel and perpendicular kinetic temperatures are equal within experimental error. These results suggest that turbulent processes previously observed in this discharge Maxwellianize the velocity distribution along a radius in velocity space and cause an isotropic energy distribution. When the distributions depart from Maxwellian, they are enhanced above the Maxwellian tail.

  8. Parallel processing in the honeybee olfactory pathway: structure, function, and evolution.

    PubMed

    Rössler, Wolfgang; Brill, Martin F

    2013-11-01

    Animals face highly complex and dynamic olfactory stimuli in their natural environments, which require fast and reliable olfactory processing. Parallel processing is a common principle of sensory systems supporting this task, for example in visual and auditory systems, but its role in olfaction remained unclear. Studies in the honeybee focused on a dual olfactory pathway. Two sets of projection neurons connect glomeruli in two antennal-lobe hemilobes via lateral and medial tracts in opposite sequence with the mushroom bodies and lateral horn. Comparative studies suggest that this dual-tract circuit represents a unique adaptation in Hymenoptera. Imaging studies indicate that glomeruli in both hemilobes receive redundant sensory input. Recent simultaneous multi-unit recordings from projection neurons of both tracts revealed widely overlapping response profiles strongly indicating parallel olfactory processing. Whereas lateral-tract neurons respond fast with broad (generalistic) profiles, medial-tract neurons are odorant specific and respond slower. In analogy to "what-" and "where" subsystems in visual pathways, this suggests two parallel olfactory subsystems providing "what-" (quality) and "when" (temporal) information. Temporal response properties may support across-tract coincidence coding in higher centers. Parallel olfactory processing likely enhances perception of complex odorant mixtures to decode the diverse and dynamic olfactory world of a social insect.

  9. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.

  10. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    NASA Technical Reports Server (NTRS)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  11. Parallel and serial processing of haptic information in man: effects of parietal lesions on sensorimotor hand function.

    PubMed

    Knecht, S; Kunesch, E; Schnitzler, A

    1996-07-01

    Recent animal studies have shown that there is an evolutionary shift within the order of primates from parallel to serial processing of haptic information. In an attempt to determine whether there is also evidence of serial processing in humans 10 patients with parietal cortical lesions, three patients with subcortical lesions and one patient after hemispherectomy, were examined. Case-by-case and across subject analysis of lesion type, sensorimotor profile and electrophysiological findings showed that in unihemispheric lesions: (a) there is little impairment of thermesthesia, nociception and vibration sense: (b) two-point discrimination and integrity of the N20 somatosensory component are highly correlated; (c) a loss of the N20 component is accompanied by a severe impairment of stereognosis; (d) conversely, in more posterior lesions astereognosis can occur with an intact N20 component; and (e) if the lesion is in the right hemisphere there is frequently impairment of graphesthesia in both hands. These data are taken to indicate serial processing from SI (as evidenced by an intact N20 component) to posterior parietal cortex allowing progressive spatial and temporal integration. In graphesthesia our data suggest an integrative function of the right parietal cortex for both sides of the body. Other sensory qualities like vibration nociception and thermesthesia are apparently processed in a non-serial, probably parallel way involving both hemispheres. The effects of cerebral lesions in our series suggest parallel as well as serial processing of somesthetic information in man underlying the perception of different haptic features.

  12. Memorability of commands learned as keywords or function keys: A parallel to voice recognition interfaces

    SciTech Connect

    Sorn, K.; Schultz, E.E. Jr.

    1987-09-15

    Voice recognition interfaces require users to input keywords to access and control functions. An experiment was conducted to compare user's memory for keywords relative to names of equivalent function keys. Thirty-five subjects attempted to learn word processing functions as keywords or in terms of the names of function keys which allowed access to and control of these functions. Keyword learning produced a significantly higher proportion of correct recalls and fewer intrusions (false recalls) after both an immediate retention test and an unexpected second test two weeks later. Superior keyword memorability is an important potential advantage of voice recognition interfaces.

  13. Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

    NASA Technical Reports Server (NTRS)

    Wheatley, Thomas E.; Michaloski, John L.; Lumia, Ronald

    1989-01-01

    Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored.

  14. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers.

    PubMed

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Juhani Virta, Marko P; Alm, Eric J

    2016-02-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells. PMID:26394010

  15. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers

    PubMed Central

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Virta, Marko PJuhani; Alm, Eric J

    2016-01-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells. PMID:26394010

  16. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers.

    PubMed

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Juhani Virta, Marko P; Alm, Eric J

    2016-02-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells.

  17. The functional significance of cortical reorganization and the parallel development of CI therapy

    PubMed Central

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W.

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  18. The functional significance of cortical reorganization and the parallel development of CI therapy.

    PubMed

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  19. Two Dictyostelium tyrosine kinase–like kinases function in parallel, stress-induced STAT activation pathways

    PubMed Central

    Araki, Tsuyoshi; Vu, Linh Hai; Sasaki, Norimitsu; Kawata, Takefumi; Eichinger, Ludwig; Williams, Jeffrey G.

    2014-01-01

    When Dictyostelium cells are hyperosmotically stressed, STATc is activated by tyrosine phosphorylation. Unusually, activation is regulated by serine phosphorylation and consequent inhibition of a tyrosine phosphatase: PTP3. The identity of the cognate tyrosine kinase is unknown, and we show that two tyrosine kinase–like (TKL) enzymes, Pyk2 and Pyk3, share this function; thus, for stress-induced STATc activation, single null mutants are only marginally impaired, but the double mutant is nonactivatable. When cells are stressed, Pyk2 and Pyk3 undergo increased autocatalytic tyrosine phosphorylation. The site(s) that are generated bind the SH2 domain of STATc, and then STATc becomes the target of further kinase action. The signaling pathways that activate Pyk2 and Pyk3 are only partially overlapping, and there may be a structural basis for this difference because Pyk3 contains both a TKL domain and a pseudokinase domain. The latter functions, like the JH2 domain of metazoan JAKs, as a negative regulator of the kinase domain. The fact that two differently regulated kinases catalyze the same phosphorylation event may facilitate specific targeting because under stress, Pyk3 and Pyk2 accumulate in different parts of the cell; Pyk3 moves from the cytosol to the cortex, whereas Pyk2 accumulates in cytosolic granules that colocalize with PTP3. PMID:25143406

  20. The functional and anatomical organization of marsupial neocortex: Evidence for parallel evolution across mammals

    PubMed Central

    Karlen, Sarah J.; Krubitzer, Leah

    2007-01-01

    Marsupials are a diverse group of mammals that occupy a large range of habitats and have evolved a wide array of unique adaptations. Although they are as diverse as placental mammals, our understanding of marsupial brain organization is more limited. Like placental mammals, marsupials have striking similarities in neocortical organization, such as a constellation of cortical fields including S1, S2, V1, V2, and A1, that are functionally, architectonically, and connectionally distinct. In this review, we describe the general lifestyle and morphological characteristics of all marsupials and the organization of somatosensory, motor, visual, and auditory cortex. For each sensory system, we compare the functional organization and the corticocortical and thalamocortical connections of the neocortex across species. Differences between placental and marsupial species are discussed and the theories on neocortical evolution that have been derived from studying marsupials, particularly the idea of a sensorimotor amalgam, are evaluated. Overall, marsupials inhabit a variety of niches and assume many different lifestyles. For example, marsupials occupy terrestrial, arboreal, burrowing, and aquatic environments; some animals are highly social while others are solitary; and different species are carnivorous, herbivorous, or omnivorous. For each of these adaptations, marsupials have evolved an array of morphological, behavioral, and cortical specializations that are strikingly similar to those observed in placental mammals occupying similar habitats, which indicate that there are constraints imposed on evolving nervous systems that result in recurrent solutions to similar environmental challenges. PMID:17507143

  1. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    SciTech Connect

    Gudino, N.; Sonmez, M.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Balaban, R. S.; Hansen, M. S.; Yao, Z.; Baig, T.; Martens, M.; Griswold, M. A.

    2015-01-15

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B{sub 1}) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  2. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    PubMed Central

    Gudino, N.; Sonmez, M.; Yao, Z.; Baig, T.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Martens, M.; Balaban, R. S.; Hansen, M. S.; Griswold, M. A.

    2015-01-01

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B1) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  3. Developmental downregulation of GABAergic drive parallels formation of functional synapses in cultured mouse neocortical networks.

    PubMed

    Klueva, Julia; Meis, Susanne; de Lima, Ana D; Voigt, Thomas; Munsch, Thomas

    2008-06-01

    Networks of cortical neurons in vitro spontaneously develop synchronous oscillatory electrical activity at around the second week in culture. However, the underlying mechanisms and in particular the role of GABAergic interneurons in initiation and synchronization of oscillatory activity in developing cortical networks remain elusive. Here, we examined the intrinsic properties and the development of GABAergic and glutamatergic input onto presumed projection neurons (PNs) and large interneurons (L-INs) in cortical cultures of GAD67-GFP mice. Cultures developed spontaneous synchronous activity already at 5-7 days in vitro (DIV), as revealed by imaging transient changes in Fluo-3 fluorescence. Concurrently, spontaneous glutamate-mediated and GABA(A)-mediated postsynaptic currents (sPSCs) occured at 5 DIV. For both types of neurons the frequency of glutamatergic and GABAergic sPSCs increased with DIV, whereas the charge transfer of glutamatergic sPSCs increased and the charge transfer of GABAergic sPSCs decreased with cultivation time. The ratio between GABAergic and the overall charge transfer was significantly reduced with DIV for L-INs and PNs, indicating an overall reduction in GABAergic synaptic drive with maturation of the network. In contrast, analysis of miniature PSCs (mPSCs) revealed no significant changes of charge transfer with DIV for both types of neurons, indicating that the reduction in GABAergic drive was not due to a decreased number of functional synapses. Our data suggest that the global reduction in GABAergic synaptic drive together with more synaptic input to PNs and L-INs during maturation may enhance rhythmogenesis of the network and increase the synchronization at the level of population bursts. PMID:18361402

  4. EUPDF: Eulerian Monte Carlo Probability Density Function Solver for Applications With Parallel Computing, Unstructured Grids, and Sprays

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    1998-01-01

    The success of any solution methodology used in the study of gas-turbine combustor flows depends a great deal on how well it can model the various complex and rate controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as convective and radiative heat transfer and other phenomena. The phenomena to be modeled, which are controlled by these processes, often strongly interact with each other at different times and locations. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and evaporation in many practical combustion devices. The influence of turbulence in a diffusion flame manifests itself in several forms, ranging from the so-called wrinkled, or stretched, flamelets regime to the distributed combustion regime, depending upon how turbulence interacts with various flame scales. Conventional turbulence models have difficulty treating highly nonlinear reaction rates. A solution procedure based on the composition joint probability density function (PDF) approach holds the promise of modeling various important combustion phenomena relevant to practical combustion devices (such as extinction, blowoff limits, and emissions predictions) because it can account for nonlinear chemical reaction rates without making approximations. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on the PDF method to unstructured grids, parallel computing, and sprays. EUPDF, which was developed by M.S. Raju of Nyma, Inc., was designed to be massively parallel and could easily be coupled with any existing gas-phase and/or spray solvers. EUPDF can use an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements. The application of the PDF method showed favorable results when applied to several supersonic

  5. Super-resolution non-parametric deconvolution in modelling the radial response function of a parallel plate ionization chamber.

    PubMed

    Kulmala, A; Tenhunen, M

    2012-11-01

    The signal of the dosimetric detector is generally dependent on the shape and size of the sensitive volume of the detector. In order to optimize the performance of the detector and reliability of the output signal the effect of the detector size should be corrected or, at least, taken into account. The response of the detector can be modelled using the convolution theorem that connects the system input (actual dose), output (measured result) and the effect of the detector (response function) by a linear convolution operator. We have developed the super-resolution and non-parametric deconvolution method for determination of the cylinder symmetric ionization chamber radial response function. We have demonstrated that the presented deconvolution method is able to determine the radial response for the Roos parallel plate ionization chamber with a better than 0.5 mm correspondence with the physical measures of the chamber. In addition, the performance of the method was proved by the excellent agreement between the output factors of the stereotactic conical collimators (4-20 mm diameter) measured by the Roos chamber, where the detector size is larger than the measured field, and the reference detector (diode). The presented deconvolution method has a potential in providing reference data for more accurate physical models of the ionization chamber as well as for improving and enhancing the performance of the detectors in specific dosimetric problems.

  6. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication.

    PubMed

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given 'Y' shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips.

  7. A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

    PubMed

    Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

    2014-06-10

    We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion. PMID:26580754

  8. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    PubMed Central

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  9. The ETS Family Transcription Factors Etv5 and PU.1 Function in Parallel To Promote Th9 Cell Development.

    PubMed

    Koh, Byunghee; Hufford, Matthew M; Pham, Duy; Olson, Matthew R; Wu, Tong; Jabeen, Rukhsana; Sun, Xin; Kaplan, Mark H

    2016-09-15

    The IL-9-secreting Th9 subset of CD4 Th cells develop in response to an environment containing IL-4 and TGF-β, promoting allergic disease, autoimmunity, and resistance to pathogens. We previously identified a requirement for the ETS family transcription factor PU.1 in Th9 development. In this report, we demonstrate that the ETS transcription factor ETS variant 5 (ETV5) promotes IL-9 production in Th9 cells by binding and recruiting histone acetyltransferases to the Il9 locus at sites distinct from PU.1. In cells that are deficient in both PU.1 and ETV5 there is lower IL-9 production than in cells lacking either factor alone. In vivo loss of PU.1 and ETV5 in T cells results in distinct effects on allergic inflammation in the lung, suggesting that these factors function in parallel. Together, these data define a role for ETV5 in Th9 development and extend the paradigm of related transcription factors having complementary functions during differentiation.

  10. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    NASA Astrophysics Data System (ADS)

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips.

  11. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication.

    PubMed

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given 'Y' shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  12. Parallel functional activity profiling reveals valvulopathogens are potent 5-hydroxytryptamine(2B) receptor agonists: implications for drug safety assessment.

    PubMed

    Huang, Xi-Ping; Setola, Vincent; Yadav, Prem N; Allen, John A; Rogan, Sarah C; Hanson, Bonnie J; Revankar, Chetana; Robers, Matt; Doucette, Chris; Roth, Bryan L

    2009-10-01

    Drug-induced valvular heart disease (VHD) is a serious side effect of a few medications, including some that are on the market. Pharmacological studies of VHD-associated medications (e.g., fenfluramine, pergolide, methysergide, and cabergoline) have revealed that they and/or their metabolites are potent 5-hydroxytryptamine(2B) (5-HT(2B)) receptor agonists. We have shown that activation of 5-HT(2B) receptors on human heart valve interstitial cells in vitro induces a proliferative response reminiscent of the fibrosis that typifies VHD. To identify current or future drugs that might induce VHD, we screened approximately 2200 U.S. Food and Drug Administration (FDA)-approved or investigational medications to identify 5-HT(2B) receptor agonists, using calcium-based high-throughput screening. Of these 2200 compounds, 27 were 5-HT(2B) receptor agonists (hits); 14 of these had previously been identified as 5-HT(2B) receptor agonists, including seven bona fide valvulopathogens. Six of the hits (guanfacine, quinidine, xylometazoline, oxymetazoline, fenoldopam, and ropinirole) are approved medications. Twenty-three of the hits were then "functionally profiled" (i.e., assayed in parallel for 5-HT(2B) receptor agonism using multiple readouts to test for functional selectivity). In these assays, the known valvulopathogens were efficacious at concentrations as low as 30 nM, whereas the other compounds were less so. Hierarchical clustering analysis of the pEC(50) data revealed that ropinirole (which is not associated with valvulopathy) was clearly segregated from known valvulopathogens. Taken together, our data demonstrate that patterns of 5-HT(2B) receptor functional selectivity might be useful for identifying compounds likely to induce valvular heart disease. PMID:19570945

  13. Functional Traits in Parallel Evolutionary Radiations and Trait-Environment Associations in the Cape Floristic Region of South Africa.

    PubMed

    Mitchell, Nora; Moore, Timothy E; Mollmann, Hayley Kilroy; Carlson, Jane E; Mocko, Kerri; Martinez-Cabrera, Hugo; Adams, Christopher; Silander, John A; Jones, Cynthia S; Schlichting, Carl D; Holsinger, Kent E

    2015-04-01

    Evolutionary radiations with extreme levels of diversity present a unique opportunity to study the role of the environment in plant evolution. If environmental adaptation played an important role in such radiations, we expect to find associations between functional traits and key climatic variables. Similar trait-environment associations across clades may reflect common responses, while contradictory associations may suggest lineage-specific adaptations. Here, we explore trait-environment relationships in two evolutionary radiations in the fynbos biome of the highly biodiverse Cape Floristic Region (CFR) of South Africa. Protea and Pelargonium are morphologically and evolutionarily diverse genera that typify the CFR yet are substantially different in growth form and morphology. Our analytical approach employs a Bayesian multiple-response generalized linear mixed-effects model, taking into account covariation among traits and controlling for phylogenetic relationships. Of the pairwise trait-environment associations tested, 6 out of 24 were in the same direction and 2 out of 24 were in opposite directions, with the latter apparently reflecting alternative life-history strategies. These findings demonstrate that trait diversity within two plant lineages may reflect both parallel and idiosyncratic responses to the environment, rather than all taxa conforming to a global-scale pattern. Such insights are essential for understanding how trait-environment associations arise and how they influence species diversification.

  14. Parallelization and improvements of the generalized born model with a simple sWitching function for modern graphics processors.

    PubMed

    Arthur, Evan J; Brooks, Charles L

    2016-04-15

    Two fundamental challenges of simulating biologically relevant systems are the rapid calculation of the energy of solvation and the trajectory length of a given simulation. The Generalized Born model with a Simple sWitching function (GBSW) addresses these issues by using an efficient approximation of Poisson-Boltzmann (PB) theory to calculate each solute atom's free energy of solvation, the gradient of this potential, and the subsequent forces of solvation without the need for explicit solvent molecules. This study presents a parallel refactoring of the original GBSW algorithm and its implementation on newly available, low cost graphics chips with thousands of processing cores. Depending on the system size and nonbonded force cutoffs, the new GBSW algorithm offers speed increases of between one and two orders of magnitude over previous implementations while maintaining similar levels of accuracy. We find that much of the algorithm scales linearly with an increase of system size, which makes this water model cost effective for solvating large systems. Additionally, we utilize our GPU-accelerated GBSW model to fold the model system chignolin, and in doing so we demonstrate that these speed enhancements now make accessible folding studies of peptides and potentially small proteins.

  15. Functional Traits in Parallel Evolutionary Radiations and Trait-Environment Associations in the Cape Floristic Region of South Africa.

    PubMed

    Mitchell, Nora; Moore, Timothy E; Mollmann, Hayley Kilroy; Carlson, Jane E; Mocko, Kerri; Martinez-Cabrera, Hugo; Adams, Christopher; Silander, John A; Jones, Cynthia S; Schlichting, Carl D; Holsinger, Kent E

    2015-04-01

    Evolutionary radiations with extreme levels of diversity present a unique opportunity to study the role of the environment in plant evolution. If environmental adaptation played an important role in such radiations, we expect to find associations between functional traits and key climatic variables. Similar trait-environment associations across clades may reflect common responses, while contradictory associations may suggest lineage-specific adaptations. Here, we explore trait-environment relationships in two evolutionary radiations in the fynbos biome of the highly biodiverse Cape Floristic Region (CFR) of South Africa. Protea and Pelargonium are morphologically and evolutionarily diverse genera that typify the CFR yet are substantially different in growth form and morphology. Our analytical approach employs a Bayesian multiple-response generalized linear mixed-effects model, taking into account covariation among traits and controlling for phylogenetic relationships. Of the pairwise trait-environment associations tested, 6 out of 24 were in the same direction and 2 out of 24 were in opposite directions, with the latter apparently reflecting alternative life-history strategies. These findings demonstrate that trait diversity within two plant lineages may reflect both parallel and idiosyncratic responses to the environment, rather than all taxa conforming to a global-scale pattern. Such insights are essential for understanding how trait-environment associations arise and how they influence species diversification. PMID:25811086

  16. Biphasic Decline of β-Cell Function with Age in Euglycemic Non-Obese Diabetic (NOD) Mice Parallels Diabetes Onset

    PubMed Central

    Cechin, Sirlene R.; Lopez-Ocejo, Omar; Karpinsky-Semper, Darla; Buchwald, Peter

    2015-01-01

    A gradual decline in insulin response is known to precede the onset of type 1 diabetes (T1D). To track age-related changes in the β-cell function of non-obese diabetic (NOD) mice, the most commonly used animal model for T1D, and to establish differences between those who do and do not become hyperglycemic, we performed a long-term longitudinal oral glucose tolerance test (OGTT) study (10–42 weeks) in combination with immunofluorescence imaging of islet morphology and cell proliferation. We observed a clear biphasic decline in insulin secretion (AUC0–30min) even in euglycemic animals. A first phase (10–28 weeks) consisted of a relatively rapid decline and paralleled diabetes development in the same cohort of animals. This was followed by a second phase (29–42 weeks) during which insulin secretion declined much slower while no additional animals became diabetic. Blood glucose profiles showed a corresponding, but less pronounced change: the area under the concentration curve (AUC0–150min) increased with age, and fit with a bilinear model indicated a rate-change in the trendline around 28 weeks. In control NOD scids, no such changes were observed. Islet morphology also changed with age as islets become surrounded by mononuclear infiltrates, and, in all mice, islets with immune cell infiltration around them showed increased β-cell proliferation. In conclusion, insulin secretion declines in a biphasic manner in all NOD mice. This trend, as well as increased β-cell proliferation, is present even in the NODs that never become diabetic, whereas, it is absent in control NOD scid mice. PMID:26099053

  17. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  18. Eclipse Parallel Tools Platform

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures

  19. Massively parallel visualization: Parallel rendering

    SciTech Connect

    Hansen, C.D.; Krogh, M.; White, W.

    1995-12-01

    This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.

  20. Dynamic multi-swarm particle swarm optimizer using parallel PC cluster systems for global optimization of large-scale multimodal functions

    NASA Astrophysics Data System (ADS)

    Fan, Shu-Kai S.; Chang, Ju-Ming

    2010-05-01

    This article presents a novel parallel multi-swarm optimization (PMSO) algorithm with the aim of enhancing the search ability of standard single-swarm PSOs for global optimization of very large-scale multimodal functions. Different from the existing multi-swarm structures, the multiple swarms work in parallel, and the search space is partitioned evenly and dynamically assigned in a weighted manner via the roulette wheel selection (RWS) mechanism. This parallel, distributed framework of the PMSO algorithm is developed based on a master-slave paradigm, which is implemented on a cluster of PCs using message passing interface (MPI) for information interchange among swarms. The PMSO algorithm handles multiple swarms simultaneously and each swarm performs PSO operations of its own independently. In particular, one swarm is designated for global search and the others are for local search. The first part of the experimental comparison is made among the PMSO, standard PSO, and two state-of-the-art algorithms (CTSS and CLPSO) in terms of various un-rotated and rotated benchmark functions taken from the literature. In the second part, the proposed multi-swarm algorithm is tested on large-scale multimodal benchmark functions up to 300 dimensions. The results of the PMSO algorithm show great promise in solving high-dimensional problems.

  1. Parallel pipelining

    SciTech Connect

    Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.

    1995-09-01

    In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.

  2. Analysis of speedup as function of block size and cluster size for parallel feed-forward neural networks on a Beowulf cluster.

    PubMed

    Mörchen, Fabian

    2004-03-01

    The performance of feed-forward neural networks trained with the backpropagation algorithm on a dedicated Beowulf cluster is analyzed. The concept of training set parallelism is applied. A new model for run time and speedup prediction is developed. With the model the speedup and efficiency of one iteration of the neural networks can be estimated as a function of block size and cluster size. The model is applied to three example problems representing different applications and network architectures. The estimation of the model has a higher accuracy than traditional methods for run time estimation and can be efficiently calculated. Experiments show that speedup of one iteration does not necessarily translate to a shorter training time toward a given error level. To overcome this problem a heuristic extension to training set parallelism called weight averaging is developed. The results show that training in parallel should only be done on clusters with high performance network connections or a multiprocessor machine. A rule of thumb is given for how much network performance of the cluster is needed to achieve speedup of the training time for a neural network.

  3. rasterEngine: an easy-to-use R function for applying complex geostatistical models to raster datasets in a parallel computing environment

    NASA Astrophysics Data System (ADS)

    Greenberg, J. A.

    2013-12-01

    As geospatial analyses progress in tandem with increasing availability of large complex geographic data sets and high performance computing (HPC), there is an increasing gap in the ability of end-user tools to take advantage of these advances. Specifically, the practical implementation of complex statistical models on large gridded geographic datasets (e.g. remote sensing analysis, species distribution mapping, topographic transformations, and local neighborhood analyses) currently requires a significant knowledge base. A user must be proficient in the chosen model as well as the nuances of scientific programming, raster data models, memory management, parallel computing, and system design. This is further complicated by the fact that many of the cutting-edge analytical tools were developed for non-geospatial datasets and are not part of standard GIS packages, but are available in scientific computing languages such as R and MATLAB. We present a computing function 'rasterEngine' written in the R scientific computing language and part of the CRAN package 'spatial.tools' with these challenges in mind. The goal of rasterEngine is to allow a user to quickly develop and apply analytical models within the R computing environment to arbitrarily large gridded datasets, taking advantage of available parallel computing resources, and without requiring a deep understanding of HPC and raster data models. We provide several examples of rasterEngine being used to solve common grid based analyses, including remote sensing image analyses, topographic transformations, and species distribution modeling. With each example, the parallel processing performance results are presented.

  4. Accelerating the performance of a novel meshless method based on collocation with radial basis functions by employing a graphical processing unit as a parallel coprocessor

    NASA Astrophysics Data System (ADS)

    Owusu-Banson, Derek

    In recent times, a variety of industries, applications and numerical methods including the meshless method have enjoyed a great deal of success by utilizing the graphical processing unit (GPU) as a parallel coprocessor. These benefits often include performance improvement over the previous implementations. Furthermore, applications running on graphics processors enjoy superior performance per dollar and performance per watt than implementations built exclusively on traditional central processing technologies. The GPU was originally designed for graphics acceleration but the modern GPU, known as the General Purpose Graphical Processing Unit (GPGPU) can be used for scientific and engineering calculations. The GPGPU consists of massively parallel array of integer and floating point processors. There are typically hundreds of processors per graphics card with dedicated high-speed memory. This work describes an application written by the author, titled GaussianRBF to show the implementation and results of a novel meshless method that in-cooperates the collocation of the Gaussian radial basis function by utilizing the GPU as a parallel co-processor. Key phases of the proposed meshless method have been executed on the GPU using the NVIDIA CUDA software development kit. Especially, the matrix fill and solution phases have been carried out on the GPU, along with some post processing. This approach resulted in a decreased processing time compared to similar algorithm implemented on the CPU while maintaining the same accuracy.

  5. HNF4 and HNF1 as well as a panel of hepatic functions are extinguished and reexpressed in parallel in chromosomally reduced rat hepatoma-human fibroblast hybrids

    PubMed Central

    1993-01-01

    Rat hepatoma-human fibroblast hybrids of two independent lineages containing only 8-11 human chromosomes show pleiotropic extinction of thirteen out of fifteen hepatic functions examined. Reexpression of the entire group of functions most often occurs in a block, and except for one discordant subclone, correlates with loss of human chromosome 2. The extinguished cells and their reexpressing derivatives have been examined for the expression of seven liver-enriched transcription factors. C/EBP, LAP, DBP, HNF3, and vHNF1 expression are not systematically extinguished in parallel with the hepatic functions. However, HNF1 and HNF4 show a perfect correlation with phenotype: these factors are expressed only in the cells showing pleiotropic reexpression. Since recent evidence indicates that HNF4 controls HNF1 expression, it can be proposed that the HNF4 gene is the primary target of the pleiotropic extinguisher. PMID:8491780

  6. Left Ventricular Function Evaluation on a 3T MR Scanner with Parallel RF Transmission Technique: Prospective Comparison of Cine Sequences Acquired before and after Gadolinium Injection

    PubMed Central

    Caspar, Thibault; Schultz, Anthony; Schaeffer, Mickaël; Labani, Aïssam; Jeung, Mi-Young; Jurgens, Paul Thomas; El Ghannudi, Soraya; Roy, Catherine; Ohana, Mickaël

    2016-01-01

    Objectives To compare cine MR b-TFE sequences acquired before and after gadolinium injection, on a 3T scanner with a parallel RF transmission technique in order to potentially improve scanning time efficiency when evaluating LV function. Methods 25 consecutive patients scheduled for a cardiac MRI were prospectively included and had their b-TFE cine sequences acquired before and right after gadobutrol injection. Images were assessed qualitatively (overall image quality, LV edge sharpness, artifacts and LV wall motion) and quantitatively with measurement of LVEF, LV mass, and telediastolic volume and contrast-to-noise ratio (CNR) between the myocardium and the cardiac chamber. Statistical analysis was conducted using a Bayesian paradigm. Results No difference was found before or after injection for the LVEF, LV mass and telediastolic volume evaluations. Overall image quality and CNR were significantly lower after injection (estimated coefficient cine after > cine before gadolinium: -1.75 CI = [-3.78;-0.0305], prob(coef>0) = 0% and -0.23 CI = [-0.49;0.04], prob(coef>0) = 4%) respectively), but this decrease did not affect the visual assessment of LV wall motion (cine after > cine before gadolinium: -1.46 CI = [-4.72;1.13], prob(coef>0) = 15%). Conclusions In 3T cardiac MRI acquired with parallel RF transmission technique, qualitative and quantitative assessment of LV function can reliably be performed with cine sequences acquired after gadolinium injection, despite a significant decrease in the CNR and the overall image quality. PMID:27669571

  7. Plastic and Genetic Variation in Wing Loading as a Function of Temperature Within and Among Parallel Clines in Drosophila subobscura.

    PubMed

    Gilchrist, George W; Huey, Raymond B

    2004-12-01

    Drosophila subobscura is a European (EU) species that was introduced into South America (SA) approximately 25 years ago. Previous studies have found rapid clinal evolution in wing size and in chromosome inversion frequency in the SA colonists, and these clines parallel those found among the ancestral EU populations. Here we examine thermoplastic changes in wing length in flies reared at 15, 20, and 25°C from 10 populations on each continent. Wings are plastically largest in flies reared at 15°C (the coldest temperature) and genetically largest from populations that experience cooler temperatures on both continents. We hypothesize that flies living in cold temperatures benefit from reduced wing loading: ectotherms with cold muscles generate less power per wing beat, and hence larger wings and/or a smaller mass would facilitate fight. We develop a simple null model, based on isometric growth, to test our hypothesis. We find that both EU and SA flies exhibit adaptive plasticity in wing loading: flies reared at 15°C generally have lower wing loadings than do flies reared at 20°C or 25°C. Clinal patterns, however, are strikingly different. The ancestral EU populations show adaptive clinal variation at rearing a temperature of 15°C: flies from cool climates have lower wing loadings. In the colonizing populations from SA, however, we cannot reject the null model: wing loading increases with decreasing clinal temperatures. Our data suggest that selective factors other than flight have favored the rapid evolution of large overall size at low environmental temperatures. However, selection for increased flight ability in such environments may secondarily favor reduced body mass.

  8. Parallel Total Energy

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  9. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells.

    PubMed

    Liu, Xiao-Ping; Koehler, Karl R; Mikosz, Andrew M; Hashino, Eri; Holt, Jeffrey R

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair.

  10. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells

    PubMed Central

    Liu, Xiao-Ping; Koehler, Karl R.; Mikosz, Andrew M.; Hashino, Eri; Holt, Jeffrey R.

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair. PMID:27215798

  11. Parallel Information Processing.

    ERIC Educational Resources Information Center

    Rasmussen, Edie M.

    1992-01-01

    Examines parallel computer architecture and the use of parallel processors for text. Topics discussed include parallel algorithms; performance evaluation; parallel information processing; parallel access methods for text; parallel and distributed information retrieval systems; parallel hardware for text; and network models for information…

  12. A parallel implementation of the analytic nuclear gradient for time-dependent density functional theory within the Tamm-Dancoff approximation

    NASA Astrophysics Data System (ADS)

    Liu, Fenglai; Gan, Zhengting; Shao, Yihan; Hsu, Chao-Ping; Dreuw, Andreas; Head-Gordon, Martin; Miller, Benjamin T.; Brooks, Bernard R.; Yu, Jian-Guo; Furlani, Thomas R.; Kong, Jing

    2010-10-01

    We derived the analytic gradient for the excitation energies from a time-dependent density functional theory calculation within the Tamm-Dancoff approximation (TDDFT/TDA) using Gaussian atomic orbital basis sets, and introduced an efficient serial and parallel implementation. Some timing results are shown from a B3LYP/6-31G**/SG-1-grid calculation on zincporphyrin. We also performed TDDFT/TDA geometry optimizations for low-lying excited states of 20 small molecules, and compared adiabatic excitation energies and optimized geometry parameters to experimental values using the B3LYP and ωB97 functionals. There are only minor differences between TDDFT and TDA optimized excited state geometries and adiabatic excitation energies. Optimized bond lengths are in better agreement with experiment for both functionals than either CC2 or SOS-CIS(D0), while adiabatic excitation energies are in similar or slightly poorer agreement. Optimized bond angles with both functionals are more accurate than CIS values, but less accurate than either CC2 or SOS-CIS(D0) ones.

  13. Non-local dynamic solution of two parallel cracks in a functionally graded piezoelectric material under harmonic anti-plane shear wave

    NASA Astrophysics Data System (ADS)

    Liu, Hai-Tao; Sang, Jian-Bing; Zhou, Zhen-Gong

    2016-10-01

    This paper investigates a functionally graded piezoelectric material (FGPM) containing two parallel cracks under harmonic anti-plane shear stress wave based on the non-local theory. The electric permeable boundary condition is considered. To overcome the mathematical difficulty, a one-dimensional non-local kernel is used instead of a two-dimensional one for the dynamic fracture problem to obtain the stress and the electric displacement fields near the crack tips. The problem is formulated through Fourier transform into two pairs of dual-integral equations, in which the unknown variables are jumps of displacements across the crack surfaces. Different from the classical solutions, that the present solution exhibits no stress and electric displacement singularities at the crack tips.

  14. NOCA-1 functions with γ-tubulin and in parallel to Patronin to assemble non-centrosomal microtubule arrays in C. elegans.

    PubMed

    Wang, Shaohe; Wu, Di; Quintin, Sophie; Green, Rebecca A; Cheerambathur, Dhanya K; Ochoa, Stacy D; Desai, Arshad; Oegema, Karen

    2015-09-15

    Non-centrosomal microtubule arrays assemble in differentiated tissues to perform mechanical and transport-based functions. In this study, we identify Caenorhabditis elegans NOCA-1 as a protein with homology to vertebrate ninein. NOCA-1 contributes to the assembly of non-centrosomal microtubule arrays in multiple tissues. In the larval epidermis, NOCA-1 functions redundantly with the minus end protection factor Patronin/PTRN-1 to assemble a circumferential microtubule array essential for worm growth and morphogenesis. Controlled degradation of a γ-tubulin complex subunit in this tissue revealed that γ-tubulin acts with NOCA-1 in parallel to Patronin/PTRN-1. In the germline, NOCA-1 and γ-tubulin co-localize at the cell surface, and inhibiting either leads to a microtubule assembly defect. γ-tubulin targets independently of NOCA-1, but NOCA-1 targeting requires γ-tubulin when a non-essential putatively palmitoylated cysteine is mutated. These results show that NOCA-1 acts with γ-tubulin to assemble non-centrosomal arrays in multiple tissues and highlight functional overlap between the ninein and Patronin protein families.

  15. NOCA-1 functions with γ-tubulin and in parallel to Patronin to assemble non-centrosomal microtubule arrays in C. elegans

    PubMed Central

    Wang, Shaohe; Wu, Di; Quintin, Sophie; Green, Rebecca A; Cheerambathur, Dhanya K; Ochoa, Stacy D; Desai, Arshad; Oegema, Karen

    2015-01-01

    Non-centrosomal microtubule arrays assemble in differentiated tissues to perform mechanical and transport-based functions. In this study, we identify Caenorhabditis elegans NOCA-1 as a protein with homology to vertebrate ninein. NOCA-1 contributes to the assembly of non-centrosomal microtubule arrays in multiple tissues. In the larval epidermis, NOCA-1 functions redundantly with the minus end protection factor Patronin/PTRN-1 to assemble a circumferential microtubule array essential for worm growth and morphogenesis. Controlled degradation of a γ-tubulin complex subunit in this tissue revealed that γ-tubulin acts with NOCA-1 in parallel to Patronin/PTRN-1. In the germline, NOCA-1 and γ-tubulin co-localize at the cell surface, and inhibiting either leads to a microtubule assembly defect. γ-tubulin targets independently of NOCA-1, but NOCA-1 targeting requires γ-tubulin when a non-essential putatively palmitoylated cysteine is mutated. These results show that NOCA-1 acts with γ-tubulin to assemble non-centrosomal arrays in multiple tissues and highlight functional overlap between the ninein and Patronin protein families. DOI: http://dx.doi.org/10.7554/eLife.08649.001 PMID:26371552

  16. Fracture problem for an external circumferential crack in a functionally graded superconducting cylinder subjected to a parallel magnetic field

    NASA Astrophysics Data System (ADS)

    Yan, Z.; Gao, S. W.; Feng, W. J.

    2016-02-01

    In this study, the multiple isoparametric finite element method (MIFEM) is used to investigate external circumferential crack problem of a functionally graded superconducting cylinder subjected to electromagnetic forces. The superconducting cylinder is composed by Bi2223/Ag composite with material parameters varying. A crack reference region is defined to reflect the effects of crack on flux and current densities, and the magnetically impermeable crack surface condition and the generalized Irie-Yamafuji critical state model outside the crack region are adopted. The distributions of magnetic flux density in the superconducting cylinder are obtained analytically for both the zero-field cooling (ZFC) and the field cooling (FC) activation processes. Based on the MIFEM, the stress intensity factors (SIFs) at crack fronts in the process of field ascent and/or descent are then numerically calculated. It is interesting to note from numerical results that for the present crack model in the ZFC activation process, the crack is easily propagate and grow with the applied field increases, and that in the field descent process of either the ZFC case or FC case, the crack generally does not propagate. In addition, in the field ascent process of the ZFC case, the SIFs depend on not only the crack depths and model parameters but also the applied field. The present study should be helpful to the design and application of high-temperature superconductors with external edge cracks.

  17. Massively parallel patterning of complex 2D and 3D functional polymer brushes by polymer pen lithography.

    PubMed

    Xie, Zhuang; Chen, Chaojian; Zhou, Xuechang; Gao, Tingting; Liu, Danqing; Miao, Qian; Zheng, Zijian

    2014-08-13

    We report the first demonstration of centimeter-area serial patterning of complex 2D and 3D functional polymer brushes by high-throughput polymer pen lithography. Arbitrary 2D and 3D structures of poly(glycidyl methacrylate) (PGMA) brushes are fabricated over areas as large as 2 cm × 1 cm, with a remarkable throughput being 3 orders of magnitudes higher than the state-of-the-arts. Patterned PGMA brushes are further employed as resist for fabricating Au micro/nanostructures and hard molds for the subsequent replica molding of soft stamps. On the other hand, these 2D and 3D PGMA brushes are also utilized as robust and versatile platforms for the immobilization of bioactive molecules to form 2D and 3D patterned DNA oligonucleotide and protein chips. Therefore, this low-cost, yet high-throughput "bench-top" serial fabrication method can be readily applied to a wide range of fields including micro/nanofabrication, optics and electronics, smart surfaces, and biorelated studies.

  18. Optimizing parallel reduction operations

    SciTech Connect

    Denton, S.M.

    1995-06-01

    A parallel program consists of sets of concurrent and sequential tasks. Often, a reduction (such as array sum) sequentially combines values produced by a parallel computation. Because reductions occur so frequently in otherwise parallel programs, they are good candidates for optimization. Since reductions may introduce dependencies, most languages separate computation and reduction. The Sisal functional language is unique in that reduction is a natural consequence of loop expressions; the parallelism is implicit in the language. Unfortunately, the original language supports only seven reduction operations. To generalize these expressions, the Sisal 90 definition adds user-defined reductions at the language level. Applicable optimizations depend upon the mathematical properties of the reduction. Compilation and execution speed, synchronization overhead, memory use and maximum size influence the final implementation. This paper (1) Defines reduction syntax and compares with traditional concurrent methods; (2) Defines classes of reduction operations; (3) Develops analysis of classes for optimized concurrency; (4) Incorporates reductions into Sisal 1.2 and Sisal 90; (5) Evaluates performance and size of the implementations.

  19. Parallel Programming in the Age of Ubiquitous Parallelism

    NASA Astrophysics Data System (ADS)

    Pingali, Keshav

    2014-04-01

    Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs

  20. Ultrascalable petaflop parallel supercomputer

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  1. Special parallel processing workshop

    SciTech Connect

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  2. Matpar: Parallel Extensions for MATLAB

    NASA Technical Reports Server (NTRS)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  3. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  4. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  5. MPP parallel forth

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1987-01-01

    Massively Parallel Processor (MPP) Parallel FORTH is a derivative of FORTH-83 and Unified Software Systems' Uni-FORTH. The extension of FORTH into the realm of parallel processing on the MPP is described. With few exceptions, Parallel FORTH was made to follow the description of Uni-FORTH as closely as possible. Likewise, the parallel FORTH extensions were designed as philosophically similar to serial FORTH as possible. The MPP hardware characteristics, as viewed by the FORTH programmer, is discussed. Then a description is presented of how parallel FORTH is implemented on the MPP.

  6. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  7. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  8. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  9. Verbal and Visual Parallelism

    ERIC Educational Resources Information Center

    Fahnestock, Jeanne

    2003-01-01

    This study investigates the practice of presenting multiple supporting examples in parallel form. The elements of parallelism and its use in argument were first illustrated by Aristotle. Although real texts may depart from the ideal form for presenting multiple examples, rhetorical theory offers a rationale for minimal, parallel presentation. The…

  10. Parallel algorithm development

    SciTech Connect

    Adams, T.F.

    1996-06-01

    Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.

  11. Parallel Implicit Algorithms for CFD

    NASA Technical Reports Server (NTRS)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  12. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    SciTech Connect

    Azmy, Y.Y.; Barnett, D.A.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  13. Multitasking TORT under UNICOS: Parallel performance models and measurements

    SciTech Connect

    Barnett, A.; Azmy, Y.Y.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  14. Linearly exact parallel closures for slab geometry

    SciTech Connect

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-15

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients)

  15. Linearly exact parallel closures for slab geometry

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-01

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).

  16. Parallel digital forensics infrastructure.

    SciTech Connect

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  17. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  18. Parallel MR Imaging

    PubMed Central

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A.; Seiberlich, Nicole

    2015-01-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the under-sampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. PMID:22696125

  19. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  20. Social Problems and Deviance: Some Parallel Issues

    ERIC Educational Resources Information Center

    Kitsuse, John I.; Spector, Malcolm

    1975-01-01

    Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)

  1. Parallel Lisp simulator

    SciTech Connect

    Weening, J.S.

    1988-05-01

    CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.

  2. Parallel computing works

    SciTech Connect

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  3. Totally parallel multilevel algorithms

    NASA Technical Reports Server (NTRS)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  4. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  5. The AIS-5000 parallel processor

    SciTech Connect

    Schmitt, L.A.; Wilson, S.S.

    1988-05-01

    The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In this paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.

  6. Trajectories in parallel optics.

    PubMed

    Klapp, Iftach; Sochen, Nir; Mendlovic, David

    2011-10-01

    In our previous work we showed the ability to improve the optical system's matrix condition by optical design, thereby improving its robustness to noise. It was shown that by using singular value decomposition, a target point-spread function (PSF) matrix can be defined for an auxiliary optical system, which works parallel to the original system to achieve such an improvement. In this paper, after briefly introducing the all optics implementation of the auxiliary system, we show a method to decompose the target PSF matrix. This is done through a series of shifted responses of auxiliary optics (named trajectories), where a complicated hardware filter is replaced by postprocessing. This process manipulates the pixel confined PSF response of simple auxiliary optics, which in turn creates an auxiliary system with the required PSF matrix. This method is simulated on two space variant systems and reduces their system condition number from 18,598 to 197 and from 87,640 to 5.75, respectively. We perform a study of the latter result and show significant improvement in image restoration performance, in comparison to a system without auxiliary optics and to other previously suggested hybrid solutions. Image restoration results show that in a range of low signal-to-noise ratio values, the trajectories method gives a significant advantage over alternative approaches. A third space invariant study case is explored only briefly, and we present a significant improvement in the matrix condition number from 1.9160e+013 to 34,526.

  7. Bilingual parallel programming

    SciTech Connect

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.

  8. Advanced parallel processing with supercomputer architectures

    SciTech Connect

    Hwang, K.

    1987-10-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers.

  9. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  10. The Function of the Glutamate-Nitric Oxide-cGMP Pathway in Brain in Vivo and Learning Ability Decrease in Parallel in Mature Compared with Young Rats

    ERIC Educational Resources Information Center

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-01-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate-nitric oxide-cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to…

  11. Expression of mitochondrial regulatory genes parallels respiratory capacity and contractile function in a rat model of hypoxia-induced right ventricular hypertrophy

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chronic hypobaric hypoxia (CHH) increases load on the right ventricle (RV) resulting in RV hypertrophy. We hypothesized that CHH elicits distinct responses, i.e., the hypertrophied RV, unlike the left ventricle (LV), displaying enhanced mitochondrial respiratory and contractile function. Wistar rats...

  12. Processing Semblances Induced through Inter-Postsynaptic Functional LINKs, Presumed Biological Parallels of K-Lines Proposed for Building Artificial Intelligence

    PubMed Central

    Vadakkan, Kunjumon I.

    2011-01-01

    The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation – namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky’s K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system. PMID:21845180

  13. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  14. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  15. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  16. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  17. Artificial intelligence in parallel

    SciTech Connect

    Waldrop, M.M.

    1984-08-10

    The current rage in the Artificial Intelligence (AI) community is parallelism: the idea is to build machines with many independent processors doing many things at once. The upshot is that about a dozen parallel machines are now under development for AI alone. As might be expected, the approaches are diverse yet there are a number of fundamental issues in common: granularity, topology, control, and algorithms.

  18. Continuous parallel coordinates.

    PubMed

    Heinrich, Julian; Weiskopf, Daniel

    2009-01-01

    Typical scientific data is represented on a grid with appropriate interpolation or approximation schemes,defined on a continuous domain. The visualization of such data in parallel coordinates may reveal patterns latently contained in the data and thus can improve the understanding of multidimensional relations. In this paper, we adopt the concept of continuous scatterplots for the visualization of spatially continuous input data to derive a density model for parallel coordinates. Based on the point-line duality between scatterplots and parallel coordinates, we propose a mathematical model that maps density from a continuous scatterplot to parallel coordinates and present different algorithms for both numerical and analytical computation of the resulting density field. In addition, we show how the 2-D model can be used to successively construct continuous parallel coordinates with an arbitrary number of dimensions. Since continuous parallel coordinates interpolate data values within grid cells, a scalable and dense visualization is achieved, which will be demonstrated for typical multi-variate scientific data.

  19. Parallel reduction in expression, but no loss of functional constraint, in two opsin paralogs within cave populations of Gammarus minus (Crustacea: Amphipoda)

    PubMed Central

    2013-01-01

    Background Gammarus minus, a freshwater amphipod living in the cave and surface streams in the eastern USA, is a premier candidate for studying the evolution of troglomorphic traits such as pigmentation loss, elongated appendages, and reduced eyes. In G. minus, multiple pairs of genetically related, physically proximate cave and surface populations exist which exhibit a high degree of intraspecific morphological divergence. The morphology, ecology, and genetic structure of these sister populations are well characterized, yet the genetic basis of their morphological divergence remains unknown. Results We used degenerate PCR primers designed to amplify opsin genes within the subphylum Crustacea and discovered two distinct opsin paralogs (average inter-paralog protein divergence ≈ 20%) in the genome of three independently derived pairs of G. minus cave and surface populations. Both opsin paralogs were found to be related to other crustacean middle wavelength sensitive opsins. Low levels of nucleotide sequence variation (< 1% within populations) were detected in both opsin genes, regardless of habitat, and dN/dS ratios did not indicate a relaxation of functional constraint in the cave populations with reduced or absent eyes. Maximum likelihood analyses using codon-based models also did not detect a relaxation of functional constraint in the cave lineages. We quantified expression level of both opsin genes and found that the expression of both paralogs was significantly reduced in all three cave populations relative to their sister surface populations. Conclusions The concordantly lowered expression level of both opsin genes in cave populations of G. minus compared to sister surface populations, combined with evidence for persistent purifying selection in the cave populations, is consistent with an unspecified pleiotropic function of opsin proteins. Our results indicate that phototransduction proteins such as opsins may have retained their function in cave

  20. Finite element computation with parallel VLSI

    NASA Technical Reports Server (NTRS)

    Mcgregor, J.; Salama, M.

    1983-01-01

    This paper describes a parallel processing computer consisting of a 16-bit microcomputer as a master processor which controls and coordinates the activities of 8086/8087 VLSI chip set slave processors working in parallel. The hardware is inexpensive and can be flexibly configured and programmed to perform various functions. This makes it a useful research tool for the development of, and experimentation with parallel mathematical algorithms. Application of the hardware to computational tasks involved in the finite element analysis method is demonstrated by the generation and assembly of beam finite element stiffness matrices. A number of possible schemes for the implementation of N-elements on N- or n-processors (N is greater than n) are described, and the speedup factors of their time consumption are determined as a function of the number of available parallel processors.

  1. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect

    Gray, Genetha Anne.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica L.

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  2. Parallel re-modeling of EF-1α function: divergent EF-1α genes co-occur with EFL genes in diverse distantly related eukaryotes

    PubMed Central

    2013-01-01

    Background Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a ‘differential loss’ hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species). Results In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes. Conclusions According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes. PMID:23800323

  3. Parallel time integration software

    SciTech Connect

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.

  4. Parallel time integration software

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore » come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less

  5. Parallelism in System Tools

    SciTech Connect

    Matney, Sr., Kenneth D; Shipman, Galen M

    2010-01-01

    The Cray XT, when employed in conjunction with the Lustre filesystem, has provided the ability to generate huge amounts of data in the form of many files. Typically, this is accommodated by satisfying the requests of large numbers of Lustre clients in parallel. In contrast, a single service node (Lustre client) cannot adequately service such datasets. This means that the use of traditional UNIX tools like cp, tar, et alli (with have no parallel capability) can result in substantial impact to user productivity. For example, to copy a 10 TB dataset from the service node using cp would take about 24 hours, under more or less ideal conditions. During production operation, this could easily extend to 36 hours. In this paper, we introduce the Lustre User Toolkit for Cray XT, developed at the Oak Ridge Leadership Computing Facility (OLCF). We will show that Linux commands, implementing highly parallel I/O algorithms, provide orders of magnitude greater performance, greatly reducing impact to productivity.

  6. Parallel optical sampler

    DOEpatents

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  7. Parallel programming with Ada

    SciTech Connect

    Kok, J.

    1988-01-01

    To the human programmer the ease of coding distributed computing is highly dependent on the suitability of the employed programming language. But with a particular language it is also important whether the possibilities of one or more parallel architectures can efficiently be addressed by available language constructs. In this paper the possibilities are discussed of the high-level language Ada and in particular of its tasking concept as a descriptional tool for the design and implementation of numerical and other algorithms that allow execution of parts in parallel. Language tools are explained and their use for common applications is shown. Conclusions are drawn about the usefulness of several Ada concepts.

  8. The NAS Parallel Benchmarks

    SciTech Connect

    Bailey, David H.

    2009-11-15

    The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage

  9. SPINning parallel systems software.

    SciTech Connect

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-03-15

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin.

  10. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  11. CaZF, a plant transcription factor functions through and parallel to HOG and calcineurin pathways in Saccharomyces cerevisiae to provide osmotolerance.

    PubMed

    Jain, Deepti; Roy, Nilanjan; Chattopadhyay, Debasis

    2009-01-01

    Salt-sensitive yeast mutants were deployed to characterize a gene encoding a C2H2 zinc finger protein (CaZF) that is differentially expressed in a drought-tolerant variety of chickpea (Cicer arietinum) and provides salinity-tolerance in transgenic tobacco. In Saccharomyces cerevisiae most of the cellular responses to hyper-osmotic stress is regulated by two interconnected pathways involving high osmolarity glycerol mitogen-activated protein kinase (Hog1p) and Calcineurin (CAN), a Ca(2+)/calmodulin-regulated protein phosphatase 2B. In this study, we report that heterologous expression of CaZF provides osmotolerance in S. cerevisiae through Hog1p and Calcineurin dependent as well as independent pathways. CaZF partially suppresses salt-hypersensitive phenotypes of hog1, can and hog1can mutants and in conjunction, stimulates HOG and CAN pathway genes with subsequent accumulation of glycerol in absence of Hog1p and CAN. CaZF directly binds to stress response element (STRE) to activate STRE-containing promoter in yeast. Transactivation and salt tolerance assays of CaZF deletion mutants showed that other than the transactivation domain a C-terminal domain composed of acidic and basic amino acids is also required for its function. Altogether, results from this study suggests that CaZF is a potential plant salt-tolerance determinant and also provide evidence that in budding yeast expression of HOG and CAN pathway genes can be stimulated in absence of their regulatory enzymes to provide osmotolerance.

  12. Fus3p and Kss1p control G1 arrest in Saccharomyces cerevisiae through a balance of distinct arrest and proliferative functions that operate in parallel with Far1p.

    PubMed Central

    Cherkasova, V; Lyons, D M; Elion, E A

    1999-01-01

    In Saccharomyces cerevisiae, mating pheromones activate two MAP kinases (MAPKs), Fus3p and Kss1p, to induce G1 arrest prior to mating. Fus3p is known to promote G1 arrest by activating Far1p, which inhibits three Clnp/Cdc28p kinases. To analyze the contribution of Fus3p and Kss1p to G1 arrest that is independent of Far1p, we constructed far1 CLN strains that undergo G1 arrest from increased activation of the mating MAP kinase pathway. We find that Fus3p and Kss1p both control G1 arrest through multiple functions that operate in parallel with Far1p. Fus3p and Kss1p together promote G1 arrest by repressing transcription of G1/S cyclin genes (CLN1, CLN2, CLB5) by a mechanism that blocks their activation by Cln3p/Cdc28p kinase. In addition, Fus3p and Kss1p counteract G1 arrest through overlapping and distinct functions. Fus3p and Kss1p together increase the expression of CLN3 and PCL2 genes that promote budding, and Kss1p inhibits the MAP kinase cascade. Strikingly, Fus3p promotes proliferation by a novel function that is not linked to reduced Ste12p activity or increased levels of Cln2p/Cdc28p kinase. Genetic analysis suggests that Fus3p promotes proliferation through activation of Mcm1p transcription factor that upregulates numerous genes in G1 phase. Thus, Fus3p and Kss1p control G1 arrest through a balance of arrest functions that inhibit the Cdc28p machinery and proliferative functions that bypass this inhibition. PMID:10049917

  13. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  14. High performance parallel architectures

    SciTech Connect

    Anderson, R.E. )

    1989-09-01

    In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.

  15. Parallel Multigrid Equation Solver

    2001-09-07

    Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.

  16. Optical parallel selectionist systems

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John

    1993-01-01

    There are at least two major classes of computers in nature and technology: connectionist and selectionist. A subset of connectionist systems (Turing Machines) dominates modern computing, although another subset (Neural Networks) is growing rapidly. Selectionist machines have unique capabilities which should allow them to do truly creative operations. It is possible to make a parallel optical selectionist system using methods describes in this paper.

  17. Parallel fast gauss transform

    SciTech Connect

    Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan

    2010-01-01

    We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.

  18. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  19. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (Inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  20. Parallel hierarchical global illumination

    SciTech Connect

    Snell, Q.O.

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  1. Parallel hierarchical radiosity rendering

    SciTech Connect

    Carter, M.

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  2. Parallel Dislocation Simulator

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  3. LEWICE droplet trajectory calculations on a parallel computer

    NASA Technical Reports Server (NTRS)

    Caruso, Steven C.

    1993-01-01

    A parallel computer implementation (128 processors) of LEWICE, a NASA Lewis code used to predict the time-dependent ice accretion process for two-dimensional aerodynamic bodies of simple geometries, is described. Two-dimensional parallel droplet trajectory calculations are performed to demonstrate the potential benefits of applying parallel processing to ice accretion analysis. Parallel performance is evaluated as a function of the number of trajectories and the number of processors. For comparison, similar trajectory calculations are performed on single-processor Cray computers, and the best parallel results are found to be 33 and 23 times faster, respectively, than those of the Cray XMP and YMP.

  4. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  5. A low-fat yoghurt supplemented with a rooster comb extract on muscle joint function in adults with mild knee pain: a randomized, double blind, parallel, placebo-controlled, clinical trial of efficacy.

    PubMed

    Solà, Rosa; Valls, Rosa-Maria; Martorell, Isabel; Giralt, Montserrat; Pedret, Anna; Taltavull, Núria; Romeu, Marta; Rodríguez, Àurea; Moriña, David; Lopez de Frutos, Victor; Montero, Manuel; Casajuana, Maria-Carmen; Pérez, Laura; Faba, Jenny; Bernal, Gloria; Astilleros, Anna; González, Roser; Puiggrós, Francesc; Arola, Lluís; Chetrit, Carlos; Martinez-Puig, Daniel

    2015-11-01

    Preliminary results suggested that oral-administration of rooster comb extract (RCE) rich in hyaluronic acid (HA) was associated with improved muscle strength. Following these promising results, the objective of the present study was to evaluate the effect of low-fat yoghurt supplemented with RCE rich in HA on muscle function in adults with mild knee pain; a symptom of early osteoarthritis. Participants (n = 40) received low-fat yoghurt (125 mL d(-1)) supplemented with 80 mg d(-1) of RCE and the placebo group (n = 40) consumed the same yoghurt without the RCE, in a randomized, controlled, double-blind, parallel trial over 12 weeks. Using an isokinetic dynamometer (Biodex System 4), RCE consumption, compared to control, increased the affected knee peak torque, total work and mean power at 180° s(-1), at least 11% in men (p < 0.05) with no differences in women. No dietary differences were noted. These results suggest that long-term consumption of low-fat yoghurt supplemented with RCE could be a dietary tool to improve muscle strength in men, associated with possible clinical significance. However, further studies are needed to elucidate reasons for these sex difference responses observed, and may provide further insight into muscle function.

  6. Solution-phase parallel synthesis of a pharmacophore library of HUN-7293 analogues: a general chemical mutagenesis approach to defining structure-function properties of naturally occurring cyclic (depsi)peptides.

    PubMed

    Chen, Yan; Bilban, Melitta; Foster, Carolyn A; Boger, Dale L

    2002-05-15

    HUN-7293 (1), a naturally occurring cyclic heptadepsipeptide, is a potent inhibitor of cell adhesion molecule expression (VCAM-1, ICAM-1, E-selectin), the overexpression of which is characteristic of chronic inflammatory diseases. Representative of a general approach to defining structure-function relationships of such cyclic (depsi)peptides, the parallel synthesis and evaluation of a complete library of key HUN-7293 analogues are detailed enlisting solution-phase techniques and simple acid-base liquid-liquid extractions for isolation and purification of intermediates and final products. Significant to the design of the studies and unique to solution-phase techniques, the library was assembled superimposing a divergent synthetic strategy onto a convergent total synthesis. An alanine scan and N-methyl deletion of each residue of the cyclic heptadepsipeptide identified key sites responsible for or contributing to the biological properties. The simultaneous preparation of a complete set of individual residue analogues further simplifying the structure allowed an assessment of each structural feature of 1, providing a detailed account of the structure-function relationships in a single study. Within this pharmacophore library prepared by systematic chemical mutagenesis of the natural product structure, simplified analogues possessing comparable potency and, in some instances, improved selectivity were identified. One potent member of this library proved to be an additional natural product in its own right, which we have come to refer to as HUN-7293B (8), being isolated from the microbial strain F/94-499709.

  7. Hybrid Optimization Parallel Search PACKage

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework providesmore » a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less

  8. Parallel computers and parallel algorithms for CFD: An introduction

    NASA Astrophysics Data System (ADS)

    Roose, Dirk; Vandriessche, Rafael

    1995-10-01

    This text presents a tutorial on those aspects of parallel computing that are important for the development of efficient parallel algorithms and software for computational fluid dynamics. We first review the main architectural features of parallel computers and we briefly describe some parallel systems on the market today. We introduce some important concepts concerning the development and the performance evaluation of parallel algorithms. We discuss how work load imbalance and communication costs on distributed memory parallel computers can be minimized. We present performance results for some CFD test cases. We focus on applications using structured and block structured grids, but the concepts and techniques are also valid for unstructured grids.

  9. Parallel Consensual Neural Networks

    NASA Technical Reports Server (NTRS)

    Benediktsson, J. A.; Sveinsson, J. R.; Ersoy, O. K.; Swain, P. H.

    1993-01-01

    A new neural network architecture is proposed and applied in classification of remote sensing/geographic data from multiple sources. The new architecture is called the parallel consensual neural network and its relation to hierarchical and ensemble neural networks is discussed. The parallel consensual neural network architecture is based on statistical consensus theory. The input data are transformed several times and the different transformed data are applied as if they were independent inputs and are classified using stage neural networks. Finally, the outputs from the stage networks are then weighted and combined to make a decision. Experimental results based on remote sensing data and geographic data are given. The performance of the consensual neural network architecture is compared to that of a two-layer (one hidden layer) conjugate-gradient backpropagation neural network. The results with the proposed neural network architecture compare favorably in terms of classification accuracy to the backpropagation method.

  10. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  11. Parallel grid population

    SciTech Connect

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  12. Parallel Anisotropic Tetrahedral Adaptation

    NASA Technical Reports Server (NTRS)

    Park, Michael A.; Darmofal, David L.

    2008-01-01

    An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.

  13. Parallel multilevel preconditioners

    SciTech Connect

    Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.

    1989-01-01

    In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.

  14. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Gryphon, Coranth D.; Miller, Mark D.

    1991-01-01

    PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.

  15. Homology, convergence and parallelism.

    PubMed

    Ghiselin, Michael T

    2016-01-01

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721

  16. Collisionless parallel shocks

    NASA Technical Reports Server (NTRS)

    Khabibrakhmanov, I. KH.; Galeev, A. A.; Galinskii, V. L.

    1993-01-01

    Consideration is given to a collisionless parallel shock based on solitary-type solutions of the modified derivative nonlinear Schroedinger equation (MDNLS) for parallel Alfven waves. The standard derivative nonlinear Schroedinger equation is generalized in order to include the possible anisotropy of the plasma distribution and higher-order Korteweg-de Vies-type dispersion. Stationary solutions of MDNLS are discussed. The anisotropic nature of 'adiabatic' reflections leads to the asymmetric particle distribution in the upstream as well as in the downstream regions of the shock. As a result, nonzero heat flux appears near the front of the shock. It is shown that this causes the stochastic behavior of the nonlinear waves, which can significantly contribute to the shock thermalization.

  17. ASSEMBLY OF PARALLEL PLATES

    DOEpatents

    Groh, E.F.; Lennox, D.H.

    1963-04-23

    This invention is concerned with a rigid assembly of parallel plates in which keyways are stamped out along the edges of the plates and a self-retaining key is inserted into aligned keyways. Spacers having similar keyways are included between adjacent plates. The entire assembly is locked into a rigid structure by fastening only the outermost plates to the ends of the keys. (AEC)

  18. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Painter, J.; Hansen, C.

    1996-10-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.

  19. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  20. Trajectory optimization using parallel shooting method on parallel computer

    SciTech Connect

    Wirthman, D.J.; Park, S.Y.; Vadali, S.R.

    1995-03-01

    The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.

  1. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  2. Human liver alcohol dehydrogenase: amino acid substitution in the beta 2 beta 2 Oriental isozyme explains functional properties, establishes an active site structure, and parallels mutational exchanges in the yeast enzyme.

    PubMed Central

    Jörnvall, H; Hempel, J; Vallee, B L; Bosron, W F; Li, T K

    1984-01-01

    The homodimeric Oriental beta 2 beta 2 isozyme of human liver alcohol dehydrogenase, corresponding to an allelic variant at the ADH2 gene locus, was studied in order to define the amino acid exchange in relation to the beta 1 beta 1 isozyme, the predominant allelic form among Caucasians. Sequence analysis reveals that the amino acid substitution occurs at position 7 of the largest CNBr fragment, corresponding to position 47 of the whole protein chain. Here, the beta 2 form has a histidine residue, while, in common with other characterized mammalian liver alcohol dehydrogenases, the beta 1 form has an arginine residue. This exchange does not affect the adjacent cysteine-46 residue, which is a protein ligand to the active-site zinc atom, thus clarifying previously inconsistent results. The histidine/arginine-47 mutational replacement corresponds to a position that binds the pyrophosphate group of the coenzyme NAD(H); this explains the functional differences between the beta 1 beta 1 and beta 2 beta 2 isozymes, including both a lower pH optimum and higher turnover number of beta 2 beta 2, which is likely to be the mutant form. The exchange demonstrates the existence of parallel but separate mutations in the evolution of alcohol dehydrogenases because these mammalian enzymes differ at exactly the same position by the same type of substitution as is found between a mutant and the wild-type constitutive forms of the corresponding yeast enzyme. PMID:6374651

  3. Resistor Combinations for Parallel Circuits.

    ERIC Educational Resources Information Center

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  4. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  5. Asynchronous interpretation of parallel microprograms

    SciTech Connect

    Bandman, O.L.

    1984-03-01

    In this article, the authors demonstrate how to pass from a given synchronous interpretation of a parallel microprogram to an equivalent asynchronous interpretation, and investigate the cost associated with the rejection of external synchronization in parallel microprogram structures.

  6. Global Arrays Parallel Programming Toolkit

    SciTech Connect

    Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel

    2011-01-01

    The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving

  7. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  8. Parallelized nested sampling

    NASA Astrophysics Data System (ADS)

    Henderson, R. Wesley; Goggans, Paul M.

    2014-12-01

    One of the important advantages of nested sampling as an MCMC technique is its ability to draw representative samples from multimodal distributions and distributions with other degeneracies. This coverage is accomplished by maintaining a number of so-called live samples within a likelihood constraint. In usual practice, at each step, only the sample with the least likelihood is discarded from this set of live samples and replaced. In [1], Skilling shows that for a given number of live samples, discarding only one sample yields the highest precision in estimation of the log-evidence. However, if we increase the number of live samples, more samples can be discarded at once while still maintaining the same precision. For computer code running only serially, this modification would considerably increase the wall clock time necessary to reach convergence. However, if we use a computer with parallel processing capabilities, and we write our code to take advantage of this parallelism to replace multiple samples concurrently, the performance penalty can be eliminated entirely and possibly reversed. In this case, we must use the more general equation in [1] for computing the expectation of the shrinkage distribution: E [- log t]= (N r-r+1)-1+(Nr-r+2)-1+⋯+Nr-1, for shrinkage t with Nr live samples and r samples discarded at each iteration. The equation for the variance Var (- log t)= (N r-r+1)-2+(Nr-r+2)-2+⋯+Nr-2 is used to find the appropriate number of live samples Nr to use with r > 1 to match the variance achieved with N1 live samples and r = 1. In this paper, we show that by replacing multiple discarded samples in parallel, we are able to achieve a more thorough sampling of the constrained prior distribution, reduce runtime, and increase precision.

  9. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.

    1995-05-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.

  10. Parallel Eclipse Project Checkout

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

    2011-01-01

    Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any

  11. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.

  12. Parallel Kinematic Machines (PKM)

    SciTech Connect

    Henry, R.S.

    2000-03-17

    The purpose of this 3-year cooperative research project was to develop a parallel kinematic machining (PKM) capability for complex parts that normally require expensive multiple setups on conventional orthogonal machine tools. This non-conventional, non-orthogonal machining approach is based on a 6-axis positioning system commonly referred to as a hexapod. Sandia National Laboratories/New Mexico (SNL/NM) was the lead site responsible for a multitude of projects that defined the machining parameters and detailed the metrology of the hexapod. The role of the Kansas City Plant (KCP) in this project was limited to evaluating the application of this unique technology to production applications.

  13. CSM parallel structural methods research

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1989-01-01

    Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.

  14. Roo: A parallel theorem prover

    SciTech Connect

    Lusk, E.L.; McCune, W.W.; Slaney, J.K.

    1991-11-01

    We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.

  15. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  16. Programming parallel architectures - The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  17. Electron parallel closures for arbitrary collisionality

    SciTech Connect

    Ji, Jeong-Young Held, Eric D.

    2014-12-15

    Electron parallel closures for heat flow, viscosity, and friction force are expressed as kernel-weighted integrals of thermodynamic drives, the temperature gradient, relative electron-ion flow velocity, and flow-velocity gradient. Simple, fitted kernel functions are obtained for arbitrary collisionality from the 6400 moment solution and the asymptotic behavior in the collisionless limit. The fitted kernels circumvent having to solve higher order moment equations in order to close the electron fluid equations. For this reason, the electron parallel closures provide a useful and general tool for theoretical and computational models of astrophysical and laboratory plasmas.

  18. Constructing higher order DNA origami arrays using DNA junctions of anti-parallel/parallel double crossovers

    NASA Astrophysics Data System (ADS)

    Ma, Zhipeng; Park, Seongsu; Yamashita, Naoki; Kawai, Kentaro; Hirai, Yoshikazu; Tsuchiya, Toshiyuki; Tabata, Osamu

    2016-06-01

    DNA origami provides a versatile method for the construction of nanostructures with defined shape, size and other properties; such nanostructures may enable a hierarchical assembly of large scale architecture for the placement of other nanomaterials with atomic precision. However, the effective use of these higher order structures as functional components depends on knowledge of their assembly behavior and mechanical properties. This paper demonstrates construction of higher order DNA origami arrays with controlled orientations based on the formation of two types of DNA junctions: anti-parallel and parallel double crossovers. A two-step assembly process, in which preformed rectangular DNA origami monomer structures themselves undergo further self-assembly to form numerically unlimited arrays, was investigated to reveal the influences of assembly parameters. AFM observations showed that when parallel double crossover DNA junctions are used, the assembly of DNA origami arrays occurs with fewer monomers than for structures formed using anti-parallel double crossovers, given the same assembly parameters, indicating that the configuration of parallel double crossovers is not energetically preferred. However, the direct measurement by AFM force-controlled mapping shows that both DNA junctions of anti-parallel and parallel double crossovers have homogeneous mechanical stability with any part of DNA origami.

  19. Massively Parallel QCD

    SciTech Connect

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-04-11

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.

  20. Parallel ptychographic reconstruction

    PubMed Central

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-01-01

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174

  1. Making parallel lines meet

    PubMed Central

    Baskin, Tobias I.; Gu, Ying

    2012-01-01

    The extracellular matrix is constructed beyond the plasma membrane, challenging mechanisms for its control by the cell. In plants, the cell wall is highly ordered, with cellulose microfibrils aligned coherently over a scale spanning hundreds of cells. To a considerable extent, deploying aligned microfibrils determines mechanical properties of the cell wall, including strength and compliance. Cellulose microfibrils have long been seen to be aligned in parallel with an array of microtubules in the cell cortex. How do these cortical microtubules affect the cellulose synthase complex? This question has stood for as many years as the parallelism between the elements has been observed, but now an answer is emerging. Here, we review recent work establishing that the link between microtubules and microfibrils is mediated by a protein named cellulose synthase-interacting protein 1 (CSI1). The protein binds both microtubules and components of the cellulose synthase complex. In the absence of CSI1, microfibrils are synthesized but their alignment becomes uncoupled from the microtubules, an effect that is phenocopied in the wild type by depolymerizing the microtubules. The characterization of CSI1 significantly enhances knowledge of how cellulose is aligned, a process that serves as a paradigmatic example of how cells dictate the construction of their extracellular environment. PMID:22902763

  2. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  3. Applied Parallel Metadata Indexing

    SciTech Connect

    Jacobi, Michael R

    2012-08-01

    The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

  4. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  5. A systolic array parallelizing compiler

    SciTech Connect

    Tseng, P.S. )

    1990-01-01

    This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.

  6. Parallel algorithms for optical digital computers

    SciTech Connect

    Huang, A.

    1983-01-01

    Conventional computers suffer from several communication bottlenecks which fundamentally limit their performance. These bottlenecks are characterised by an address-dependent sequential transfer of information which arises from the need to time-multiplex information over a limited number of interconnections. An optical digital computer based on a classical finite state machine can be shown to be free of these bottlenecks. Such a processor would be unique since it would be capable of modifying its entire state space each cycle while conventional computers can only alter a few bits. New algorithms are needed to manage and use this capability. A technique based on recognising a particular symbol in parallel and replacing it in parallel with another symbol is suggested. Examples using this parallel symbolic substitution to perform binary addition and binary incrementation are presented. Applications involving Boolean logic, functional programming languages, production rule driven artificial intelligence, and molecular chemistry are also discussed. 12 references.

  7. Parallelization of ITOUGH2 using PVM

    SciTech Connect

    Finsterle, Stefan

    1998-10-01

    ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM.

  8. Loop parallelism on Tera MTA using SISAL

    SciTech Connect

    Mitrovic, S.

    1995-11-01

    The difficulty of programming parallel computers has impeded their wide-spread use. The problems are caused by existing hardware and software tools. The software problems on shared-memory and vector computers can be solved by using deterministic high-performance functional languages like SISAL. Distributed-memory computers have even more obstacles than shared-memory parallel machines. Research indicates that multithreaded architectures can hide long latency of distributed memories and that they can solve the problems of locality. Tera`s MTA multiprocessor is based on the concept of multithreading and provides the programmer with a real shared-memory model. This paper investigates the performance of parallel loops written in SISAL and executed on the Tera MTA using the Livermore Loops benchmarks.

  9. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  10. Numerical wind tunnel and parallel FORTRAN

    NASA Astrophysics Data System (ADS)

    Nakamura, Takashi; Yoshida, Masahiro; Fukuda, Masahiro; Takamura, Moriyuki; Okada, Shin

    1992-12-01

    Computational Fluid Dynamics (CFD) requires computers 100 times faster than the Fujitsu VP400 in effective speed. Such a processor can be suitably called the 'Numerical Wind Tunnel'. Numerical Wind Tunnel (NWT) is a parallel computer system of a distributed memory architecture composed of vector processors connected through cross-bar network. In this report, the system configuration, processing element, and interconnection network and communication mechanism of the NWT are shown. Fundamental functions global data, parallel execution of DO-loop, and data decomposition and allocation, which the language-processor system has to provide in order to realize parallel execution on the NWT are also shown. FORTRAN 77 is chosen as a basic programming language for NWT and some compiler directives are added to make effective use of the NWT.

  11. Parallel Computing in SCALE

    SciTech Connect

    DeHart, Mark D; Williams, Mark L; Bowman, Stephen M

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  12. Parallel Polarization State Generation

    PubMed Central

    She, Alan; Capasso, Federico

    2016-01-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813

  13. Parallel tridiagonal equation solvers

    NASA Technical Reports Server (NTRS)

    Stone, H. S.

    1974-01-01

    Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.

  14. Parallel Polarization State Generation

    NASA Astrophysics Data System (ADS)

    She, Alan; Capasso, Federico

    2016-05-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  15. Toward Parallel Document Clustering

    SciTech Connect

    Mogill, Jace A.; Haglin, David J.

    2011-09-01

    A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

  16. Unified Parallel Software

    SciTech Connect

    McKay, Mike

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use of EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.

  17. Unified Parallel Software

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use ofmore » EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.« less

  18. Parallel Imaging Microfluidic Cytometer

    PubMed Central

    Ehrlich, Daniel J.; McKenna, Brian K.; Evans, James G.; Belkina, Anna C.; Denis, Gerald V.; Sherr, David; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of flow cytometry (FACS) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1-D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity and, (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in approximately 6–10 minutes, about 30-times the speed of most current FACS systems. In 1-D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times the sample throughput of CCD-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. PMID:21704835

  19. Parallelizing OVERFLOW: Experiences, Lessons, Results

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.

    1999-01-01

    The computer code OVERFLOW is widely used in the aerodynamic community for the numerical solution of the Navier-Stokes equations. Current trends in computer systems and architectures are toward multiple processors and parallelism, including distributed memory. This report describes work that has been carried out by the author and others at Ames Research Center with the goal of parallelizing OVERFLOW using a variety of parallel architectures and parallelization strategies. This paper begins with a brief description of the OVERFLOW code. This description includes the basic numerical algorithm and some software engineering considerations. Next comes a description of a parallel version of OVERFLOW, OVERFLOW/PVM, using PVM (Parallel Virtual Machine). This parallel version of OVERFLOW uses the manager/worker style and is part of the standard OVERFLOW distribution. Then comes a description of a parallel version of OVERFLOW, OVERFLOW/MPI, using MPI (Message Passing Interface). This parallel version of OVERFLOW uses the SPMD (Single Program Multiple Data) style. Finally comes a discussion of alternatives to explicit message-passing in the context of parallelizing OVERFLOW.

  20. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  1. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  2. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  3. Parallel computation with the force

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1985-01-01

    A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.

  4. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  5. Approximating Functions with Exponential Functions

    ERIC Educational Resources Information Center

    Gordon, Sheldon P.

    2005-01-01

    The possibility of approximating a function with a linear combination of exponential functions of the form e[superscript x], e[superscript 2x], ... is considered as a parallel development to the notion of Taylor polynomials which approximate a function with a linear combination of power function terms. The sinusoidal functions sin "x" and cos "x"…

  6. High Performance Parallel Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek; Kaewpijit, Sinthop

    1998-01-01

    Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.

  7. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  8. Parallel Adaptive Mesh Refinement

    SciTech Connect

    Diachin, L; Hornung, R; Plassmann, P; WIssink, A

    2005-03-04

    As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the

  9. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  10. Parallel contingency statistics with Titan.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2009-09-01

    This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.

  11. Coronal Kink Instability With Parallel Thermal Conduction

    NASA Astrophysics Data System (ADS)

    Botha, Gert J. J.; Arber, Tony D.; Hood, Alan W.; Srivastava, A. K.

    2012-01-01

    Thermal conduction along magnetic field lines plays an important role in the evolution of the kink instability in coronal loops. In the nonlinear phase of the instability, local heating occurs due to reconnection, so that the plasma reaches high temperatures. To study the effect of parallel thermal conduction in this process, the 3D nonlinear magnetohydrodynamic (MHD) equations are solved for an initially unstable equilibrium. The initial state is a cylindrical loop with zero net current. Parallel thermal conduction reduces the local temperature, which leads to temperatures that are an order of magnitude lower than those obtained without thermal conduction. This process is important on the timescale of fast MHD phenomena; it reduces the kinetic energy released by an order of magnitude. The impact of this process on observational signatures is presented. Synthetic observables are generated that include spatial and temporal averaging to account for the resolution and exposure times of TRACE images. It was found that the inclusion of parallel thermal conductivity does not have as large an impact on observables as the order of magnitude reduction in the maximum temperature would suggest. The reason is that response functions sample a broad range of temperatures, so that the net effect of parallel thermal conduction is a blurring of internal features of the loop structure.

  12. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  13. Parallel incremental compilation. Doctoral thesis

    SciTech Connect

    Gafter, N.M.

    1990-06-01

    The time it takes to compile a large program has been a bottleneck in the software development process. When an interactive programming environment with an incremental compiler is used, compilation speed becomes even more important, but existing incremental compilers are very slow for some types of program changes. We describe a set of techniques that enable incremental compilation to exploit fine-grained concurrency in a shared-memory multi-processor and achieve asymptotic improvement over sequential algorithms. Because parallel non-incremental compilation is a special case of parallel incremental compilation, the design of a parallel compiler is a corollary of our result. Instead of running the individual phases concurrently, our design specifies compiler phases that are mutually sequential. However, each phase is designed to exploit fine-grained parallelism. By allowing each phase to present its output as a complete structure rather than as a stream of data, we can apply techniques such as parallel prefix and parallel divide-and-conquer, and we can construct applicative data structures to achieve sublinear execution time. Parallel algorithms for each phase of a compiler are presented to demonstrate that a complete incremental compiler can achieve execution time that is asymptotically less than sequential algorithms.

  14. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  15. EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS

    SciTech Connect

    F. PETRINI; W. FENG

    1999-09-01

    We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

  16. Experimental Parallel-Processing Computer

    NASA Technical Reports Server (NTRS)

    Mcgregor, J. W.; Salama, M. A.

    1986-01-01

    Master processor supervises slave processors, each with its own memory. Computer with parallel processing serves as inexpensive tool for experimentation with parallel mathematical algorithms. Speed enhancement obtained depends on both nature of problem and structure of algorithm used. In parallel-processing architecture, "bank select" and control signals determine which one, if any, of N slave processor memories accessible to master processor at any given moment. When so selected, slave memory operates as part of master computer memory. When not selected, slave memory operates independently of main memory. Slave processors communicate with each other via input/output bus.

  17. Parallel computation of Feynman loop integrals

    NASA Astrophysics Data System (ADS)

    de Doncker, E.; Yuasa, F.

    2012-12-01

    The need for large numbers of compute-intensive integrals, arising in quantum field theory perturbation calculations, justifies the parallelization of loop integrals. In earlier work, we devised effective multivariate methods by iterated (repeated) adaptive numerical integration and extrapolation, applicable for some problem classes where standard multivariate integration techniques fail through integrand singularities. In repeated integration, the function evaluations for the outer integral are independent integral computations for the next lower level and can be distributed to threads in a multi-core parallelization. The distribution level determines the number of dimensions in the inner integral below it and thus the granularity of the computation. We discuss a multi-threaded implementation over the OpenMP Application Program Interface.

  18. Oxytocin: parallel processing in the social brain?

    PubMed

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain.

  19. Oxytocin: parallel processing in the social brain?

    PubMed

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. PMID:25912257

  20. Parallel software tools at Langley Research Center

    NASA Technical Reports Server (NTRS)

    Moitra, Stuti; Tennille, Geoffrey M.; Lakeotes, Christopher D.; Randall, Donald P.; Arthur, Jarvis J.; Hammond, Dana P.; Mall, Gerald H.

    1993-01-01

    This document gives a brief overview of parallel software tools available on the Intel iPSC/860 parallel computer at Langley Research Center. It is intended to provide a source of information that is somewhat more concise than vendor-supplied material on the purpose and use of various tools. Each of the chapters on tools is organized in a similar manner covering an overview of the functionality, access information, how to effectively use the tool, observations about the tool and how it compares to similar software, known problems or shortfalls with the software, and reference documentation. It is primarily intended for users of the iPSC/860 at Langley Research Center and is appropriate for both the experienced and novice user.

  1. Flexibility and Performance of Parallel File Systems

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1996-01-01

    As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.

  2. Parallel algorithms for matrix computations

    SciTech Connect

    Plemmons, R.J.

    1990-01-01

    The present conference on parallel algorithms for matrix computations encompasses both shared-memory systems and distributed-memory systems, as well as combinations of the two, to provide an overall perspective on parallel algorithms for both dense and sparse matrix computations in solving systems of linear equations, dense or structured problems related to least-squares computations, eigenvalue computations, singular-value computations, and rapid elliptic solvers. Specific issues addressed include the influence of parallel and vector architectures on algorithm design, computations for distributed-memory architectures such as hypercubes, solutions for sparse symmetric positive definite linear systems, symbolic and numeric factorizations, and triangular solutions. Also addressed are reference sources for parallel and vector numerical algorithms, sources for machine architectures, and sources for programming languages.

  3. Parallel architectures and neural networks

    SciTech Connect

    Calianiello, E.R. )

    1989-01-01

    This book covers parallel computer architectures and neural networks. Topics include: neural modeling, use of ADA to simulate neural networks, VLSI technology, implementation of Boltzmann machines, and analysis of neural nets.

  4. "Feeling" Series and Parallel Resistances.

    ERIC Educational Resources Information Center

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  5. Demonstrating Forces between Parallel Wires.

    ERIC Educational Resources Information Center

    Baker, Blane

    2000-01-01

    Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)

  6. Metal structures with parallel pores

    NASA Technical Reports Server (NTRS)

    Sherfey, J. M.

    1976-01-01

    Four methods of fabricating metal plates having uniformly sized parallel pores are studied: elongate bundle, wind and sinter, extrude and sinter, and corrugate stack. Such plates are suitable for electrodes for electrochemical and fuel cells.

  7. Parallel computation using limited resources

    SciTech Connect

    Sugla, B.

    1985-01-01

    This thesis addresses itself to the task of designing and analyzing parallel algorithms when the resources of processors, communication, and time are limited. The two parts of this thesis deal with multiprocessor systems and VLSI - the two important parallel processing environments that are prevalent today. In the first part a time-processor-communication tradeoff analysis is conducted for two kinds of problems - N input, 1 output, and N input, N output computations. In the class of problems of the second kind, the problem of prefix computation, an important problem due to the number of naturally occurring computations it can model, is studied. Finally, a general methodology is given for design of parallel algorithms that can be used to optimize a given design to a wide set of architectural variations. The second part of the thesis considers the design of parallel algorithms for the VLSI model of computation when the resource of time is severely restricted.

  8. Parallel algorithms for message decomposition

    SciTech Connect

    Teng, S.H.; Wang, B.

    1987-06-01

    The authors consider the deterministic and random parallel complexity (time and processor) of message decoding: an essential problem in communications systems and translation systems. They present an optimal parallel algorithm to decompose prefix-coded messages and uniquely decipherable-coded messages in O(n/P) time, using O(P) processors (for all P:1 less than or equal toPless than or equal ton/log n) deterministically as well as randomly on the weakest version of parallel random access machines in which concurrent read and concurrent write to a cell in the common memory are not allowed. This is done by reducing decoding to parallel finite-state automata simulation and the prefix sums.

  9. Turbomachinery CFD on parallel computers

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

    1992-01-01

    The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.

  10. Graphics applications utilizing parallel processing

    NASA Technical Reports Server (NTRS)

    Rice, John R.

    1990-01-01

    The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.

  11. HEATR project: ATR algorithm parallelization

    NASA Astrophysics Data System (ADS)

    Deardorf, Catherine E.

    1998-09-01

    High Performance Computing (HPC) Embedded Application for Target Recognition (HEATR) is a project funded by the High Performance Computing Modernization Office through the Common HPC Software Support Initiative (CHSSI). The goal of CHSSI is to produce portable, parallel, multi-purpose, freely distributable, support software to exploit emerging parallel computing technologies and enable application of scalable HPC's for various critical DoD applications. Specifically, the CHSSI goal for HEATR is to provide portable, parallel versions of several existing ATR detection and classification algorithms to the ATR-user community to achieve near real-time capability. The HEATR project will create parallel versions of existing automatic target recognition (ATR) detection and classification algorithms and generate reusable code that will support porting and software development process for ATR HPC software. The HEATR Team has selected detection/classification algorithms from both the model- based and training-based (template-based) arena in order to consider the parallelization requirements for detection/classification algorithms across ATR technology. This would allow the Team to assess the impact that parallelization would have on detection/classification performance across ATR technology. A field demo is included in this project. Finally, any parallel tools produced to support the project will be refined and returned to the ATR user community along with the parallel ATR algorithms. This paper will review: (1) HPCMP structure as it relates to HEATR, (2) Overall structure of the HEATR project, (3) Preliminary results for the first algorithm Alpha Test, (4) CHSSI requirements for HEATR, and (5) Project management issues and lessons learned.

  12. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  13. Modelling Layer parallel stylolites

    NASA Astrophysics Data System (ADS)

    Koehn, Daniel; Pataki Rood, Daisy; Beaudoin, Nicolas

    2016-04-01

    We modeled the geometrical roughening of mainly layer-dominated stylolites in order to understand their structural evolution, to present an advanced classification of stylolite shapes and to relate this classification to chemical compaction and stylolite sealing capabilities. Our simulations show that layer-dominated stylolites can grow in three distinct stages, an initial slow nucleation, a fast layer-pinning phase and a final freezing stage if the layer dissolves completely during growth. Dissolution of the pinning layer and thus destruction of the compaction tracking capabilities is a function of the background noise in the rock and the dissolution rate of the layer itself. Low background noise needs a slower dissolving layer for pinning to be successful but produces flatter teeth than higher background noise. We present an advanced classification based on our simulations and separate stylolites into four classes: rectangular layer type, seismogram pinning type, suture/sharp peak type and simple wave-like type.

  14. Memory Scalability and Efficiency Analysis of Parallel Codes

    SciTech Connect

    Janjusic, Tommy; Kartsaklis, Christos

    2015-01-01

    Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an application s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).

  15. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-18

    Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  16. Nemesis I: Parallel Enhancements to ExodusII

    2006-03-28

    NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on filesmore » which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.« less

  17. Chare kernel; A runtime support system for parallel computations

    SciTech Connect

    Shu, W. ); Kale, L.V. )

    1991-03-01

    This paper presents the chare kernel system, which supports parallel computations with irregular structure. The chare kernel is a collection of primitive functions that manage chares, manipulative messages, invoke atomic computations, and coordinate concurrent activities. Programs written in the chare kernel language can be executed on different parallel machines without change. Users writing such programs concern themselves with the creation of parallel actions but not with assigning them to specific processors. The authors describe the design and implementation of the chare kernel. Performance of chare kernel programs on two hypercube machines, the Intel iPSC/2 and the NCUBE, is also given.

  18. Nemesis I: Parallel Enhancements to ExodusII

    SciTech Connect

    Hennigan, Gary L.; John, Matthew S.; Shadid, John N.; Sjaardema, Gregory D.; Brayer, Mike S.

    2006-03-28

    NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on files which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.

  19. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  20. Programming parallel architectures: The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  1. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  2. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  3. Interface for Parallel I/O from Componentized Visualization Algorithms

    2008-09-16

    The software is an interface layer over file I/O with features specifically designed for efficient parallel reads and writes. The interface provides multiple concrete implementations that easily allow the replacement of one interface with another. This feature allows a reader or writer implementation to work independently of whether parallel file I/O is available or desired. The software also contains extensions to some readers to allow it to use the file I/O functionality.

  4. A class of trust-region methods for parallel optimization

    SciTech Connect

    P. D. Hough; J. C. Meza

    1999-03-01

    The authors present a new class of optimization methods that incorporates a Parallel Direct Search (PDS) method within a trust-region Newton framework. This approach combines the inherent parallelism of PDS with the rapid and robust convergence properties of Newton methods. Numerical tests have yielded favorable results for both standard test problems and engineering applications. In addition, the new method appears to be more robust in the presence of noisy functions that are inherent in many engineering simulations.

  5. Parallel computation and computers for artificial intelligence

    SciTech Connect

    Kowalik, J.S. )

    1988-01-01

    This book discusses Parallel Processing in Artificial Intelligence; Parallel Computing using Multilisp; Execution of Common Lisp in a Parallel Environment; Qlisp; Restricted AND-Parallel Execution of Logic Programs; PARLOG: Parallel Programming in Logic; and Data-driven Processing of Semantic Nets. Attention is also given to: Application of the Butterfly Parallel Processor in Artificial Intelligence; On the Range of Applicability of an Artificial Intelligence Machine; Low-level Vision on Warp and the Apply Programming Mode; AHR: A Parallel Computer for Pure Lisp; FAIM-1: An Architecture for Symbolic Multi-processing; and Overview of Al Application Oriented Parallel Processing Research in Japan.

  6. Kalman Filter Tracking on Parallel Architectures

    NASA Astrophysics Data System (ADS)

    Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2015-12-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter [2]. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.

  7. Parallel Quantum Circuit in a Tunnel Junction

    NASA Astrophysics Data System (ADS)

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian; GNS theory Group Team

    In between 2 metallic nanopads, adding identical and independent electron transfer paths in parallel increases the electronic effective coupling between the 2 nanopads through the quantum circuit defined by those paths. Measuring this increase of effective coupling using the tunnelling current intensity can lead for example for 2 paths in parallel to the now standard G =G1 +G2 + 2√{G1 .G2 } conductance superposition law (1). This is only valid for the tunnelling regime (2). For large electronic coupling to the nanopads (or at resonance), G can saturate and even decay as a function of the number of parallel paths added in the quantum circuit (3). We provide here the explanation of this phenomenon: the measurement of the effective Rabi oscillation frequency using the current intensity is constrained by the normalization principle of quantum mechanics. This limits the quantum conductance G for example to go when there is only one channel per metallic nanopads. This ef fect has important consequences for the design of Boolean logic gates at the atomic scale using atomic scale or intramolecular circuits. References: This has the financial support by European PAMS project.

  8. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  9. Computing contingency statistics in parallel.

    SciTech Connect

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    2010-09-01

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

  10. FEREBUS: Highly parallelized engine for kriging training.

    PubMed

    Di Pasquale, Nicodemo; Bane, Michael; Davie, Stuart J; Popelier, Paul L A

    2016-11-01

    FFLUX is a novel force field based on quantum topological atoms, combining multipolar electrostatics with IQA intraatomic and interatomic energy terms. The program FEREBUS calculates the hyperparameters of models produced by the machine learning method kriging. Calculation of kriging hyperparameters (θ and p) requires the optimization of the concentrated log-likelihood L̂(θ,p). FEREBUS uses Particle Swarm Optimization (PSO) and Differential Evolution (DE) algorithms to find the maximum of L̂(θ,p). PSO and DE are two heuristic algorithms that each use a set of particles or vectors to explore the space in which L̂(θ,p) is defined, searching for the maximum. The log-likelihood is a computationally expensive function, which needs to be calculated several times during each optimization iteration. The cost scales quickly with the problem dimension and speed becomes critical in model generation. We present the strategy used to parallelize FEREBUS, and the optimization of L̂(θ,p) through PSO and DE. The code is parallelized in two ways. MPI parallelization distributes the particles or vectors among the different processes, whereas the OpenMP implementation takes care of the calculation of L̂(θ,p), which involves the calculation and inversion of a particular matrix, whose size increases quickly with the dimension of the problem. The run time shows a speed-up of 61 times going from single core to 90 cores with a saving, in one case, of ∼98% of the single core time. In fact, the parallelization scheme presented reduces computational time from 2871 s for a single core calculation, to 41 s for 90 cores calculation. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:27649926

  11. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  12. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  13. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  14. PARAVT: Parallel Voronoi tessellation code

    NASA Astrophysics Data System (ADS)

    González, R. E.

    2016-10-01

    In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.

  15. Visualizing Parallel Computer System Performance

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.

    1988-01-01

    Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

  16. Fast data parallel polygon rendering

    SciTech Connect

    Ortega, F.A.; Hansen, C.D.

    1993-09-01

    This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.

  17. Parallel integrated frame synchronizer chip

    NASA Technical Reports Server (NTRS)

    Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)

    2000-01-01

    A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.

  18. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  19. Parallel transport of long mean-free-path plasma along open magnetic field lines: Parallel heat flux

    SciTech Connect

    Guo Zehua; Tang Xianzhu

    2012-06-15

    In a long mean-free-path plasma where temperature anisotropy can be sustained, the parallel heat flux has two components with one associated with the parallel thermal energy and the other the perpendicular thermal energy. Due to the large deviation of the distribution function from local Maxwellian in an open field line plasma with low collisionality, the conventional perturbative calculation of the parallel heat flux closure in its local or non-local form is no longer applicable. Here, a non-perturbative calculation is presented for a collisionless plasma in a two-dimensional flux expander bounded by absorbing walls. Specifically, closures of previously unfamiliar form are obtained for ions and electrons, which relate two distinct components of the species parallel heat flux to the lower order fluid moments such as density, parallel flow, parallel and perpendicular temperatures, and the field quantities such as the magnetic field strength and the electrostatic potential. The plasma source and boundary condition at the absorbing wall enter explicitly in the closure calculation. Although the closure calculation does not take into account wave-particle interactions, the results based on passing orbits from steady-state collisionless drift-kinetic equation show remarkable agreement with fully kinetic-Maxwell simulations. As an example of the physical implications of the theory, the parallel heat flux closures are found to predict a surprising observation in the kinetic-Maxwell simulation of the 2D magnetic flux expander problem, where the parallel heat flux of the parallel thermal energy flows from low to high parallel temperature region.

  20. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  1. Hybrid parallel programming with MPI and Unified Parallel C.

    SciTech Connect

    Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.

    2010-01-01

    The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.

  2. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-03-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

  3. Medipix2 parallel readout system

    NASA Astrophysics Data System (ADS)

    Fanti, V.; Marzeddu, R.; Randaccio, P.

    2003-08-01

    A fast parallel readout system based on a PCI board has been developed in the framework of the Medipix collaboration. The readout electronics consists of two boards: the motherboard directly interfacing the Medipix2 chip, and the PCI board with digital I/O ports 32 bits wide. The device driver and readout software have been developed at low level in Assembler to allow fast data transfer and image reconstruction. The parallel readout permits a transfer rate up to 64 Mbytes/s. http://medipix.web.cern ch/MEDIPIX/

  4. Measurements of parallel electron velocity distributions using whistler wave absorption.

    PubMed

    Thuecks, D J; Skiff, F; Kletzing, C A

    2012-08-01

    We describe a diagnostic to measure the parallel electron velocity distribution in a magnetized plasma that is overdense (ω(pe) > ω(ce)). This technique utilizes resonant absorption of whistler waves by electrons with velocities parallel to a background magnetic field. The whistler waves were launched and received by a pair of dipole antennas immersed in a cylindrical discharge plasma at two positions along an axial background magnetic field. The whistler wave frequency was swept from somewhat below and up to the electron cyclotron frequency ω(ce). As the frequency was swept, the wave was resonantly absorbed by the part of the electron phase space density which was Doppler shifted into resonance according to the relation ω - k([parallel])v([parallel]) = ω(ce). The measured absorption is directly related to the reduced parallel electron distribution function integrated along the wave trajectory. The background theory and initial results from this diagnostic are presented here. Though this diagnostic is best suited to detect tail populations of the parallel electron distribution function, these first results show that this diagnostic is also rather successful in measuring the bulk plasma density and temperature both during the plasma discharge and into the afterglow.

  5. Parallel graph reduction on a supercomputer: A status report

    SciTech Connect

    Michelsen, R.; Smith, L.; Williams, E.; Yantis, B.

    1986-01-01

    We describe an ongoing effort to develop a parallel graph reduction run-time system hosted on a multiprocessor supercomputer. This run-time system is presently augmented by the functional language compiler of the ALFALFA system. Admittedly, parallel graph reduction is hardly a novel idea. The interesting notion is the provision of a parallel execution environment sufficiently powerful to support development of large prototypical scientific and symbolic codes in a functional language. This will allow the investigation of the functional programming paradigm within these application domains in an empirical fashion. In this paper, we discuss the motivation for the effort, describe the basic elements of the implementation, and provide some preliminary insight distilled from our experience with an initial version of the run-time system.

  6. Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

    PubMed

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-06-01

    We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions. PMID:22345529

  7. File concepts for parallel I/O

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1989-01-01

    The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

  8. Parallel, Distributed Scripting with Python

    SciTech Connect

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.

  9. Parallel distributed computing using Python

    NASA Astrophysics Data System (ADS)

    Dalcin, Lisandro D.; Paz, Rodrigo R.; Kler, Pablo A.; Cosimo, Alejandro

    2011-09-01

    This work presents two software components aimed to relieve the costs of accessing high-performance parallel computing resources within a Python programming environment: MPI for Python and PETSc for Python. MPI for Python is a general-purpose Python package that provides bindings for the Message Passing Interface (MPI) standard using any back-end MPI implementation. Its facilities allow parallel Python programs to easily exploit multiple processors using the message passing paradigm. PETSc for Python provides access to the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Its facilities allow sequential and parallel Python applications to exploit state of the art algorithms and data structures readily available in PETSc for the solution of large-scale problems in science and engineering. MPI for Python and PETSc for Python are fully integrated to PETSc-FEM, an MPI and PETSc based parallel, multiphysics, finite elements code developed at CIMEC laboratory. This software infrastructure supports research activities related to simulation of fluid flows with applications ranging from the design of microfluidic devices for biochemical analysis to modeling of large-scale stream/aquifer interactions.

  10. Parallelizing AT with MatlabMPI

    SciTech Connect

    Li, Evan Y.; /Brown U. /SLAC

    2011-06-22

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  11. Do transform faults parallel plate motion?

    NASA Astrophysics Data System (ADS)

    Mishra, J. K.; Gordon, R. G.; Demets, C.; Argus, D.

    2009-12-01

    A central principle of plate tectonics is that relative plate motion is parallel to transform faults. Several workers have convincingly argued, however, that transform fault valleys widen with age due to horizontal thermal contraction of the lithosphere [Collette 1974; Roest et al 1986 ; Sandwell 1986]. If so, then the transform fault zone, which is the locus of active strike-slip faulting in a transform fault, may not be parallel to the direction of relative plate motion, but parallel to the relative velocity of the walls of the transform valley. Here we apply a recent model for horizontal contraction of oceanic lithosphere as a function of age [Kumar & Gordon 2009] to calculate this bias in transform azimuth as a function of offset, spreading rate, and ridge length, with slightly different formulations for crenelate and stepping mid-ocean ridge segments. The calculated bias increases with increasing length of ridge segments between transform faults and decreases with increasing offset of ridge segments along transform faults. We used the calculated bias to estimate corrections of azimuths of transform faults on eight plate boundaries from the MORVEL data set [DeMets, Argus, & Gordon, 2009]. The largest correction is 2.3 degrees, but the median correction is merely 0.2 degrees. We express the null hypothesis (no thermal contraction) and our hypothesis (thermal contraction as predicted by Kumar and Gordon [2009]), in terms of a parameter α, which is zero for no thermal contraction and 1 for predicted thermal contraction. When the results for the 8 plate pairs are combined, we find that α = 1.1 ± 0.7 (95% confidence limits), thus excluding the null hypothesis (no thermal contraction), but consistent with the hypothesis of horizontal thermal contraction.

  12. Do transform faults parallel plate motion?

    NASA Astrophysics Data System (ADS)

    Mishra, J. K.; Gordon, R. G.

    2008-12-01

    A central principle of plate tectonics is that relative plate motion is parallel to transform faults. Several workers have convincingly argued, however, that transform fault valleys widen with age due to horizontal thermal contraction of the lithosphere (Collette 1974; Roest et al 1986 ; Sandwell 1986). If so, then the transform fault zone, which is the locus of active strike-slip faulting in a transform fault, is not parallel to the direction of plate motion. It is also affected by the ridge-parallel contraction of the lithosphere and is biased by a predictable amount. Here we apply a recent model for horizontal contraction of oceanic lithosphere as a function of age (Kumar & Gordon 2008) to calculate this bias as a function of offset, spreading rate, and ridge length, with slightly different formulations for crenelate and stepping mid-ocean ridge segments. The bias causes right-slipping transform faults to be counter-clockwise of the true plate motion direction while left-slipping transform faults are clockwise of the true plate motion direction. The bias is larger for longer ridge segments, smaller offsets, and slower spreading. The bias ranges in magnitude from about 0.1 degree to 1.5 degrees for stepping boundaries and 0.2 degree to 3 degrees for crenelate boundaries. Application of the bias correction to NUVEL-1 transform fault azimuths about the Rodrigues, Juan Fernandez, Galapagos, and Bouvet triple junctions improves the closure of the triple junction in the first three cases but makes it slightly worse in the fourth case. Additional applications will be presented.

  13. Parallel execution of LISP programs

    SciTech Connect

    Weening, J.S.

    1989-01-01

    This dissertation considers several issues in the execution of Lisp programs on shared-memory multiprocessors. An overview of constructs for explicit parallelism in Lisp is first presented. The problems of partitioning a program into processes and scheduling these processes are then described, and a number of methods for performing these are proposed. These include cutting off process creation based on properties of the computation tree of the program, and basing partitioning decisions on the state of the system at runtime instead of the program. An experimental study of these methods has been performed using a simulator for parallel Lisp. The simulator, written in common Lisp using a continuation-passing style, is described in detail. This is followed by a description of the experiments that were performed and an analysis of the results. Two programs are used as illustrations-a Fast Fourier Transform, which has an abundance of parallelism, and the Cocke-Younger-Kasami parsing algorithm, for which good speedup is not as easy to obtain. The difficulty of using cutoff-based partitioning methods, and the differences between various scheduling methods, are shown. A combination of partitioning and scheduling methods which the author calls dynamic partitioning is analyzed in more detail. This method is based on examining the machine's runtime state; it requires that the programmer only identify parallelism in the program, without deciding which potential parallelism is actually useful. Several theorems are proved providing upper bounds on the amount of overhead produced by this method. He concludes that for programs whose computation trees have small height relative to their total size, dynamic partitioning can achieve asymptotically minimal overhead in the cost of process creation.

  14. Parallel global optimization with the particle swarm algorithm.

    PubMed

    Schutte, J F; Reinbolt, J A; Fregly, B J; Haftka, R T; George, A D

    2004-12-01

    Present day engineering optimization problems often impose large computational demands, resulting in long solution times even on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained parallelization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima-large-scale analytical test problems with computationally cheap function evaluations and medium-scale biomechanical system identification problems with computationally expensive function evaluations. For load-balanced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In contrast, for load-imbalanced biomechanical system identification problems with 12 design variables, speedup plateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the synchronization requirement of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical problems were solved using a fixed number of swarm iterations, a single population of 128 particles produced a better convergence rate than did multiple independent runs performed using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 particles). These results suggest that (1) parallel PSO exhibits excellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-life problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available.

  15. Parallel global optimization with the particle swarm algorithm

    PubMed Central

    Schutte, J. F.; Reinbolt, J. A.; Fregly, B. J.; Haftka, R. T.; George, A. D.

    2007-01-01

    SUMMARY Present day engineering optimization problems often impose large computational demands, resulting in long solution times even on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained parallelization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima—large-scale analytical test problems with computationally cheap function evaluations and medium-scale biomechanical system identification problems with computationally expensive function evaluations. For load-balanced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In contrast, for load-imbalanced biomechanical system identification problems with 12 design variables, speedup plateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the synchronization requirement of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical problems were solved using a fixed number of swarm iterations, a single population of 128 particles produced a better convergence rate than did multiple independent runs performed using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 particles). These results suggest that (1) parallel PSO exhibits excellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-life problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available. PMID:17891226

  16. Parallel tetrahedral mesh refinement with MOAB.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2008-12-01

    In this report, we present the novel functionality of parallel tetrahedral mesh refinement which we have implemented in MOAB. This report details work done to implement parallel, edge-based, tetrahedral refinement into MOAB. The theoretical basis for this work is contained in [PT04, PT05, TP06] while information on design, performance, and operation specific to MOAB are contained herein. As MOAB is intended mainly for use in pre-processing and simulation (as opposed to the post-processing bent of previous papers), the primary use case is different: rather than refining elements with non-linear basis functions, the goal is to increase the number of degrees of freedom in some region in order to more accurately represent the solution to some system of equations that cannot be solved analytically. Also, MOAB has a unique mesh representation which impacts the algorithm. This introduction contains a brief review of streaming edge-based tetrahedral refinement. The remainder of the report is broken into three sections: design and implementation, performance, and conclusions. Appendix A contains instructions for end users (simulation authors) on how to employ the refiner.

  17. Parallel payers, privatization and two-tier healthcare in Canada.

    PubMed

    Davidson, Alan

    2008-01-01

    The commissioning of care by Workers' Compensation Boards alongside provincial healthcare insurance plans functions in Canada in much the same way as parallel private insurance functions in countries like England and Australia. Parallel payers introduce policy conflict, undermine equity and promote privatization. WCB demands for expedited care for injured workers create challenges for the efficiency and fairness of the healthcare system. Unfortunately, the legitimate policy of goals of WCB and universal healthcare insurance are difficult to reconcile in the real world of Canadian healthcare policy. PMID:18493171

  18. Predicting mining activity with parallel genetic algorithms

    USGS Publications Warehouse

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  19. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  20. A generalized parallel replica dynamics

    SciTech Connect

    Binder, Andrew; Lelièvre, Tony; Simpson, Gideon

    2015-03-01

    Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming–Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.

  1. Scans as primitive parallel operations

    SciTech Connect

    Blelloch, G.E. . Dept. of Computer Science)

    1989-11-01

    In most parallel random access machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can execute in no more time than these parallel memory references. This paper outlines an extensive study of the effect of including, in the PRAM models, such scan operations as unit-time primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(log n) factor greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. The authors argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design. These all run on an EREW PRAM with the addition of two scan primitives.

  2. Two Level Parallel Grammatical Evolution

    NASA Astrophysics Data System (ADS)

    Ošmera, Pavel

    This paper describes a Two Level Parallel Grammatical Evolution (TLPGE) that can evolve complete programs using a variable length linear genome to govern the mapping of a Backus Naur Form grammar definition. To increase the efficiency of Grammatical Evolution (GE) the influence of backward processing was tested and a second level with differential evolution was added. The significance of backward coding (BC) and the comparison with standard coding of GEs is presented. The new method is based on parallel grammatical evolution (PGE) with a backward processing algorithm, which is further extended with a differential evolution algorithm. Thus a two-level optimization method was formed in attempt to take advantage of the benefits of both original methods and avoid their difficulties. Both methods used are discussed and the architecture of their combination is described. Also application is discussed and results on a real-word application are described.

  3. Parallel multiplex laser feedback interferometry

    SciTech Connect

    Zhang, Song; Tan, Yidong; Zhang, Shulian

    2013-12-15

    We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.

  4. Parallel processing spacecraft communication system

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

    1998-01-01

    An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

  5. Parallelizing the XSTAR Photoionization Code

    NASA Astrophysics Data System (ADS)

    Noble, M. S.; Ji, L.; Young, A.; Lee, J. C.

    2009-09-01

    We describe two means by which XSTAR, a code which computes physical conditions and emission spectra of photoionized gases, has been parallelized. The first is pvmxstar, a wrapper which can be used in place of the serial xstar2xspec script to foster concurrent execution of the XSTAR command line application on independent sets of parameters. The second is pmodel, a plugin for the Interactive Spectral Interpretation System (ISIS) which allows arbitrary components of a broad range of astrophysical models to be distributed across processors during fitting and confidence limits calculations, by scientists with little training in parallel programming. Plugging the XSTAR family of analytic models into pmodel enables multiple ionization states (e.g., of a complex absorber/emitter) to be computed simultaneously, alleviating the often prohibitive expense of the traditional serial approach. Initial performance results indicate that these methods substantially enlarge the problem space to which XSTAR may be applied within practical timeframes.

  6. Parallel strategies for SAR processing

    NASA Astrophysics Data System (ADS)

    Segoviano, Jesus A.

    2004-12-01

    This article proposes a series of strategies for improving the computer process of the Synthetic Aperture Radar (SAR) signal treatment, following the three usual lines of action to speed up the execution of any computer program. On the one hand, it is studied the optimization of both, the data structures and the application architecture used on it. On the other hand it is considered a hardware improvement. For the former, they are studied both, the usually employed SAR process data structures, proposing the use of parallel ones and the way the parallelization of the algorithms employed on the process is implemented. Besides, the parallel application architecture classifies processes between fine/coarse grain. These are assigned to individual processors or separated in a division among processors, all of them in their corresponding architectures. For the latter, it is studied the hardware employed on the computer parallel process used in the SAR handling. The improvement here refers to several kinds of platforms in which the SAR process is implemented, shared memory multicomputers, and distributed memory multiprocessors. A comparison between them gives us some guidelines to follow in order to get a maximum throughput with a minimum latency and a maximum effectiveness with a minimum cost, all together with a limited complexness. It is concluded and described, that the approach consisting of the processing of the algorithms in a GNU/Linux environment, together with a Beowulf cluster platform offers, under certain conditions, the best compromise between performance and cost, and promises the major development in the future for the Synthetic Aperture Radar computer power thirsty applications in the next years.

  7. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  8. Parallel fabrication of nanogap electrodes.

    PubMed

    Johnston, Danvers E; Strachan, Douglas R; Johnson, A T Charlie

    2007-09-01

    We have developed a technique for simultaneously fabricating large numbers of nanogaps in a single processing step using feedback-controlled electromigration. Parallel nanogap formation is achieved by a balanced simultaneous process that uses a novel arrangement of nanoscale shorts between narrow constrictions where the nanogaps form. Because of this balancing, the fabrication of multiple nanoelectrodes is similar to that of a single nanogap junction. The technique should be useful for constructing complex circuits of molecular-scale electronic devices.

  9. Massively parallel femtosecond laser processing.

    PubMed

    Hasegawa, Satoshi; Ito, Haruyasu; Toyoda, Haruyoshi; Hayasaki, Yoshio

    2016-08-01

    Massively parallel femtosecond laser processing with more than 1000 beams was demonstrated. Parallel beams were generated by a computer-generated hologram (CGH) displayed on a spatial light modulator (SLM). The key to this technique is to optimize the CGH in the laser processing system using a scheme called in-system optimization. It was analytically demonstrated that the number of beams is determined by the horizontal number of pixels in the SLM NSLM that is imaged at the pupil plane of an objective lens and a distance parameter pd obtained by dividing the distance between adjacent beams by the diffraction-limited beam diameter. A performance limitation of parallel laser processing in our system was estimated at NSLM of 250 and pd of 7.0. Based on these parameters, the maximum number of beams in a hexagonal close-packed structure was calculated to be 1189 by using an analytical equation. PMID:27505815

  10. Highly parallel sparse Cholesky factorization

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

  11. A parallel trajectory optimization tool for aerospace plane guidance

    NASA Technical Reports Server (NTRS)

    Psiaki, Mark L.; Park, Kihong

    1991-01-01

    A parallel trajectory optimization algorithm is being developed. One possible mission is to provide real-time, on-line guidance for the National Aerospace Plane. The algorithm solves a discrete-time problem via the augmented Lagrangian nonlinear programming algorithm. The algorithm exploits the dynamic programming structure of the problem to achieve parallelism in calculating cost functions, gradients, constraints, Jacobians, Hessian approximations, search directions, and merit functions. Special additions to the augmented Lagrangian algorithm achieve robust convergence, achieve (almost) superlinear local convergence, and deal with constraint curvature efficiency. The algorithm can handle control and state inequality constraints such as angle-of-attack and dynamic pressure constraints. Portions of the algorithm have been tested. The nonlinear programming core algorithm performs well on a variety of static test problems and on an orbit transfer problem. The parallel search direction algorithm can reduce wall clock time by a factor of 10 for this part of the computation task.

  12. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  13. Allinea DDT as a Parallel Debugging Alternative to Totalview

    SciTech Connect

    Antypas, K.B.

    2007-03-05

    Totalview, from the Etnus Corporation, is a sophisticated and feature rich software debugger for parallel applications. As Totalview has gained in popularity and market share its pricing model has increased to the point where it is often prohibitively expensive for massively parallel supercomputers. Additionally, many of Totalview's advanced features are not used by members of the scientific computing community. For these reasons, supercomputing centers have begun to search for a basic parallel debugging tool which can be used as an alternative to Totalview. As the cost and complexity of Totalview has increased over the years, scientific computing centers have started searching for a viable parallel debugging alternative. DDT (Distributed Debugging Tool) from Allinea Software is a relatively new parallel debugging tool which aims to provide much of the same functionality as Totalview. This review outlines the basic features and limitations of DDT to determine if it can be a reasonable substitute for Totalview. DDT was tested on the NERSC platforms Bassi, Seaborg, Jacquard and Davinci with Fortran90, C, and C++ codes using MPI and OpenMP for parallelism.

  14. Parallel micromanipulation method for microassembly

    NASA Astrophysics Data System (ADS)

    Sin, Jeongsik; Stephanou, Harry E.

    2001-09-01

    Microassembly deals with micron or millimeter scale objects where the tolerance requirements are in the micron range. Typical applications include electronics components (silicon fabricated circuits), optoelectronics components (photo detectors, emitters, amplifiers, optical fibers, microlenses, etc.), and MEMS (Micro-Electro-Mechanical-System) dies. The assembly processes generally require not only high precision but also high throughput at low manufacturing cost. While conventional macroscale assembly methods have been utilized in scaled down versions for microassembly applications, they exhibit limitations on throughput and cost due to the inherently serialized process. Since the assembly process depends heavily on the manipulation performance, an efficient manipulation method for small parts will have a significant impact on the manufacturing of miniaturized products. The objective of this study on 'parallel micromanipulation' is to achieve these three requirements through the handling of multiple small parts simultaneously (in parallel) with high precision (micromanipulation). As a step toward this objective, a new manipulation method is introduced. The method uses a distributed actuation array for gripper free and parallel manipulation, and a centralized, shared actuator for simplified controls. The method has been implemented on a testbed 'Piezo Active Surface (PAS)' in which an actively generated friction force field is the driving force for part manipulation. Basic motion primitives, such as translation and rotation of objects, are made possible with the proposed method. This study discusses the design of the proposed manipulation method PAS, and the corresponding manipulation mechanism. The PAS consists of two piezoelectric actuators for X and Y motion, two linear motion guides, two sets of nozzle arrays, and solenoid valves to switch the pneumatic suction force on and off in individual nozzles. One array of nozzles is fixed relative to the surface on

  15. Rochester checkers player: Multi-model parallel programming for animate vision. Technical report

    SciTech Connect

    Marsh, B.D.; Brown, C.M.; LeBlanc, T.J.; Scott, M.L.; Becker, T.G.

    1991-06-01

    Animate vision systems couple computer vision and robotics to achieve robust and accurate vision, as well as other complex behavior. These systems combine low-level sensory processing and effector output with high-level cognitive planning - all computationally intensive tasks that can benefit from parallel processing. No single model of parallel programming is likely to serve for all tasks, however. Early vision algorithms are intensely data parallel, often utilizing fine-grain parallel computations that share an image, while cognition algorithms decompose naturally by function, often consisting of loosely-coupled, coarse-grain parallel units. A typical animate vision application will likely consist of many tasks, each of which may require a different parallel programming model, and all of which must cooperate to achieve the desired behavior. These multi-model programs require an underlying software system that not only supports several different models of parallel computation simultaneously, but which also allows tasks implemented in different models to interact.

  16. A numerical differentiation library exploiting parallel architectures

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

    2009-08-01

    We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such

  17. Data-Parallel Octrees for Surface Reconstruction.

    PubMed

    Kun Zhou; Minmin Gong; Xin Huang; Baining Guo

    2011-05-01

    We present the first parallel surface reconstruction algorithm that runs entirely on the GPU. Like existing implicit surface reconstruction methods, our algorithm first builds an octree for the given set of oriented points, then computes an implicit function over the space of the octree, and finally extracts an isosurface as a watertight triangle mesh. A key component of our algorithm is a novel technique for octree construction on the GPU. This technique builds octrees in real time and uses level-order traversals to exploit the fine-grained parallelism of the GPU. Moreover, the technique produces octrees that provide fast access to the neighborhood information of each octree node, which is critical for fast GPU surface reconstruction. With an octree so constructed, our GPU algorithm performs Poisson surface reconstruction, which produces high-quality surfaces through a global optimization. Given a set of 500 K points, our algorithm runs at the rate of about five frames per second, which is over two orders of magnitude faster than previous CPU algorithms. To demonstrate the potential of our algorithm, we propose a user-guided surface reconstruction technique which reduces the topological ambiguity and improves reconstruction results for imperfect scan data. We also show how to use our algorithm to perform on-the-fly conversion from dynamic point clouds to surfaces as well as to reconstruct fluid surfaces for real-time fluid simulation.

  18. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  19. How to write fast and clear parallel programs using algebra

    SciTech Connect

    Stiller, L. Johns Hopkins Univ., Baltimore, MD )

    1992-01-01

    An algebraic method for the design of efficient and easy to port codes for parallel machines is described. The method was applied to speed up and to clarify certain communication functions, n-body codes, a biomolecular analysis, and a chess problem.

  20. How to write fast and clear parallel programs using algebra

    SciTech Connect

    Stiller, L. |

    1992-10-01

    An algebraic method for the design of efficient and easy to port codes for parallel machines is described. The method was applied to speed up and to clarify certain communication functions, n-body codes, a biomolecular analysis, and a chess problem.

  1. Parallelizing alternating direction implicit solver on GPUs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...

  2. Implementing clips on a parallel computer

    NASA Technical Reports Server (NTRS)

    Riley, Gary

    1987-01-01

    The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.

  3. Parallel evolution of chimeric fusion genes.

    PubMed

    Jones, Corbin D; Begun, David J

    2005-08-01

    To understand how novel functions arise, we must identify common patterns and mechanisms shaping the evolution of new genes. Here, we take advantage of data from three Drosophila genes, jingwei, Adh-Finnegan, and Adh-Twain, to find evolutionary patterns and mechanisms governing the evolution of new genes. All three of these genes are independently derived from Adh, which enabled us to use the extensive literature on Adh in Drosophila to guide our analyses. We discovered a fundamental similarity in the temporal, spatial, and types of amino acid changes that occurred. All three genes underwent rapid adaptive amino acid evolution shortly after they were formed, followed by later quiescence and functional constraint. These genes also show striking parallels in which amino acids change in the Adh region. We showed that these early changes tend to occur at amino acid residues that seldom, if ever, evolve in Drosophila Adh. Changes at these slowly evolving sites are usually associated with loss of function or hypomorphic mutations in Drosophila melanogaster. Our data indicate that shifting away from ancestral functions may be a critical step early in the evolution of chimeric fusion genes. We suggest that the patterns we observed are both general and predictive.

  4. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  5. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  6. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  7. High Performance Parallel Computational Nanotechnology

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to

  8. Parallel MRI at microtesla fields.

    PubMed

    Zotev, Vadim S; Volegov, Petr L; Matlashov, Andrei N; Espy, Michelle A; Mosher, John C; Kraus, Robert H

    2008-06-01

    Parallel imaging techniques have been widely used in high-field magnetic resonance imaging (MRI). Multiple receiver coils have been shown to improve image quality and allow accelerated image acquisition. Magnetic resonance imaging at ultra-low fields (ULF MRI) is a new imaging approach that uses SQUID (superconducting quantum interference device) sensors to measure the spatially encoded precession of pre-polarized nuclear spin populations at microtesla-range measurement fields. In this work, parallel imaging at microtesla fields is systematically studied for the first time. A seven-channel SQUID system, designed for both ULF MRI and magnetoencephalography (MEG), is used to acquire 3D images of a human hand, as well as 2D images of a large water phantom. The imaging is performed at 46 mu T measurement field with pre-polarization at 40 mT. It is shown how the use of seven channels increases imaging field of view and improves signal-to-noise ratio for the hand images. A simple procedure for approximate correction of concomitant gradient artifacts is described. Noise propagation is analyzed experimentally, and the main source of correlated noise is identified. Accelerated imaging based on one-dimensional undersampling and 1D SENSE (sensitivity encoding) image reconstruction is studied in the case of the 2D phantom. Actual threefold imaging acceleration in comparison to single-average fully encoded Fourier imaging is demonstrated. These results show that parallel imaging methods are efficient in ULF MRI, and that imaging performance of SQUID-based instruments improves substantially as the number of channels is increased.

  9. Parallel MRI at microtesla fields

    NASA Astrophysics Data System (ADS)

    Zotev, Vadim S.; Volegov, Petr L.; Matlashov, Andrei N.; Espy, Michelle A.; Mosher, John C.; Kraus, Robert H.

    2008-06-01

    Parallel imaging techniques have been widely used in high-field magnetic resonance imaging (MRI). Multiple receiver coils have been shown to improve image quality and allow accelerated image acquisition. Magnetic resonance imaging at ultra-low fields (ULF MRI) is a new imaging approach that uses SQUID (superconducting quantum interference device) sensors to measure the spatially encoded precession of pre-polarized nuclear spin populations at microtesla-range measurement fields. In this work, parallel imaging at microtesla fields is systematically studied for the first time. A seven-channel SQUID system, designed for both ULF MRI and magnetoencephalography (MEG), is used to acquire 3D images of a human hand, as well as 2D images of a large water phantom. The imaging is performed at 46 μT measurement field with pre-polarization at 40 mT. It is shown how the use of seven channels increases imaging field of view and improves signal-to-noise ratio for the hand images. A simple procedure for approximate correction of concomitant gradient artifacts is described. Noise propagation is analyzed experimentally, and the main source of correlated noise is identified. Accelerated imaging based on one-dimensional undersampling and 1D SENSE (sensitivity encoding) image reconstruction is studied in the case of the 2D phantom. Actual threefold imaging acceleration in comparison to single-average fully encoded Fourier imaging is demonstrated. These results show that parallel imaging methods are efficient in ULF MRI, and that imaging performance of SQUID-based instruments improves substantially as the number of channels is increased.

  10. The PARTY parallel runtime system

    NASA Technical Reports Server (NTRS)

    Saltz, J. H.; Mirchandaney, Ravi; Smith, R. M.; Crowley, Kay; Nicol, D. M.

    1989-01-01

    In the present automated system for the organization of the data and computational operations entailed by parallel problems, in ways that optimize multiprocessor performance, general heuristics for partitioning program data and control are implemented by capturing and manipulating representations of a computation at run time. These heuristics are directed toward the dynamic identification and allocation of concurrent work in computations with irregular computational patterns. An optimized static-workload partitioning is computed for such repetitive-computation pattern problems as the iterative ones employed in scientific computation.

  11. Heart Fibrillation and Parallel Supercomputers

    NASA Technical Reports Server (NTRS)

    Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.

    1997-01-01

    The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.

  12. Parallel Assembly of LIGA Components

    SciTech Connect

    Christenson, T.R.; Feddema, J.T.

    1999-03-04

    In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.

  13. PKDGRAV3: Parallel gravity code

    NASA Astrophysics Data System (ADS)

    Potter, Douglas; Stadel, Joachim

    2016-09-01

    Pkdgrav3 is an 𝒪(N) gravity calculation method; it uses a binary tree algorithm with fifth order fast multipole expansion of the gravitational potential, using cell-cell interactions. Periodic boundaries conditions require very little data movement and allow a high degree of parallelism; the code includes GPU acceleration for all force calculations, leading to a significant speed-up with respect to previous versions (ascl:1305.005). Pkdgrav3 also has a sophisticated time-stepping criterion based on an estimation of the local dynamical time.

  14. True Shear Parallel Plate Viscometer

    NASA Technical Reports Server (NTRS)

    Ethridge, Edwin; Kaukler, William

    2010-01-01

    This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.

  15. Scalable Parallel Algebraic Multigrid Solvers

    SciTech Connect

    Bank, R; Lu, S; Tong, C; Vassilevski, P

    2005-03-23

    The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.

  16. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    ERIC Educational Resources Information Center

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  17. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  18. Exploring Parallel Concordancing in English and Chinese.

    ERIC Educational Resources Information Center

    Lixun, Wang

    2001-01-01

    Investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. A English-Chinese parallel corpus was created for use in parallel concordancing--a technique that has been developed to respond to the desire to study language in its natural contexts of use. (Author/VWL)

  19. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  20. Reservoir Thermal Recover Simulation on Parallel Computers

    NASA Astrophysics Data System (ADS)

    Li, Baoyan; Ma, Yuanle

    The rapid development of parallel computers has provided a hardware background for massive refine reservoir simulation. However, the lack of parallel reservoir simulation software has blocked the application of parallel computers on reservoir simulation. Although a variety of parallel methods have been studied and applied to black oil, compositional, and chemical model numerical simulations, there has been limited parallel software available for reservoir simulation. Especially, the parallelization study of reservoir thermal recovery simulation has not been fully carried out, because of the complexity of its models and algorithms. The authors make use of the message passing interface (MPI) standard communication library, the domain decomposition method, the block Jacobi iteration algorithm, and the dynamic memory allocation technique to parallelize their serial thermal recovery simulation software NUMSIP, which is being used in petroleum industry in China. The parallel software PNUMSIP was tested on both IBM SP2 and Dawn 1000A distributed-memory parallel computers. The experiment results show that the parallelization of I/O has great effects on the efficiency of parallel software PNUMSIP; the data communication bandwidth is also an important factor, which has an influence on software efficiency. Keywords: domain decomposition method, block Jacobi iteration algorithm, reservoir thermal recovery simulation, distributed-memory parallel computer

  1. Development of massively parallel quantum chemistry program SMASH

    NASA Astrophysics Data System (ADS)

    Ishimura, Kazuya

    2015-12-01

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C150H30)2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  2. Analysis of the Rotopod: An all revolute parallel manipulator

    SciTech Connect

    Schmitt, D.J.; Benavides, G.L.; Bieg, L.F.; Kozlowski, D.M.

    1998-05-16

    This paper introduces a new configuration of parallel manipulator call the Rotopod which is constructed from all revolute type joints. The Rotopod consists of two platforms connected by six legs and exhibits six Cartesian degrees of freedom. The Rotopod is initially compared with other all revolute joint parallel manipulators to show its similarities and differences. The inverse kinematics for this mechanism are developed and used to analyze the accessible workspace of the mechanism. Optimization is performed to determine the Rotopod design configurations which maximum the accessible workspace based on desirable functional constraints.

  3. Local and nonlocal parallel heat transport in general magnetic fields

    SciTech Connect

    Del-Castillo-Negrete, Diego B; Chacon, Luis

    2011-01-01

    A novel approach for the study of parallel transport in magnetized plasmas is presented. The method avoids numerical pollution issues of grid-based formulations and applies to integrable and chaotic magnetic fields with local or nonlocal parallel closures. In weakly chaotic fields, the method gives the fractal structure of the devil's staircase radial temperature profile. In fully chaotic fields, the temperature exhibits self-similar spatiotemporal evolution with a stretched-exponential scaling function for local closures and an algebraically decaying one for nonlocal closures. It is shown that, for both closures, the effective radial heat transport is incompatible with the quasilinear diffusion model.

  4. Development of massively parallel quantum chemistry program SMASH

    SciTech Connect

    Ishimura, Kazuya

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  5. A massively asynchronous, parallel brain.

    PubMed

    Zeki, Semir

    2015-05-19

    Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously--with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871

  6. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  7. Efficient parallel global garbage collection on massively parallel computers

    SciTech Connect

    Kamada, Tomio; Matsuoka, Satoshi; Yonezawa, Akinori

    1994-12-31

    On distributed-memory high-performance MPPs where processors are interconnected by an asynchronous network, efficient Garbage Collection (GC) becomes difficult due to inter-node references and references within pending, unprocessed messages. The parallel global GC algorithm (1) takes advantage of reference locality, (2) efficiently traverses references over nodes, (3) admits minimum pause time of ongoing computations, and (4) has been shown to scale up to 1024 node MPPs. The algorithm employs a global weight counting scheme to substantially reduce message traffic. The two methods for confirming the arrival of pending messages are used: one counts numbers of messages and the other uses network `bulldozing.` Performance evaluation in actual implementations on a multicomputer with 32-1024 nodes, Fujitsu AP1000, reveals various favorable properties of the algorithm.

  8. Implementation and performance of parallelized elegant.

    SciTech Connect

    Wang, Y.; Borland, M.; Accelerator Systems Division

    2008-01-01

    The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.

  9. New Parallel computing framework for radiation transport codes

    SciTech Connect

    Kostin, M.A.; Mokhov, N.V.; Niita, K.; /JAERI, Tokai

    2010-09-01

    A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  10. Experience of implementing applicative parallelism on Cray X-MP

    SciTech Connect

    Lee, Ching-Cheng

    1988-05-01

    The traditional approach on Cray multiprocessing has been multitasking in FORTRAN environments with operating system support. This kind of approach usually requires explicit user intervention to restructure or reformulated the algorithms to control parallel execution. However, it is error-prone and in most cases, only coarse-grain parallelism such as subrountines or functions was explored. In this paper, an implemention of micro-tasking for an applicative language called SISAL on Cray X-MP is reported. The experience indicates where the automatic approach to parallelization works well, as well as sources of inefficient behavior on the Cray X-MP. The experience also suggests some future improvements of SISAL multiprocessing on Cray X-MP. 9 refs., 1 fig.

  11. Parallel algorithm of VLBI software correlator under multiprocessor environment

    NASA Astrophysics Data System (ADS)

    Zheng, Weimin; Zhang, Dong

    2007-11-01

    The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.

  12. Automating the parallel processing of fluid and structural dynamics calculations

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Cole, Gary L.

    1987-01-01

    The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilties to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.

  13. Automating the parallel processing of fluid and structural dynamics calculations

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Cole, Gary L.

    1987-01-01

    The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.

  14. Parallelization strategy for large-scale vibronic coupling calculations.

    PubMed

    Rabidoux, Scott M; Eijkhout, Victor; Stanton, John F

    2014-12-26

    The vibronic coupling model of Köppel, Domcke, and Cederbaum is a powerful means to understand, predict, and analyze electronic spectra of molecules, especially those that exhibit phenomena that involve breakdown of the Born-Oppenheimer approximation. In this work, we describe a new parallel algorithm for carrying out such calculations. The algorithm is conceptually founded upon a "stencil" representation of the required computational steps, which motivates an efficient strategy for coarse-grained parallelization. The equations involved in the direct-CI type diagonalization of the model Hamiltonian are presented, the parallelization strategy is discussed in detail, and the method is illustrated by calculations involving direct-product basis sets with as many as 17 vibrational modes and 130 billion basis functions. PMID:25295469

  15. Adaptive, multiresolution visualization of large data sets using parallel octrees.

    SciTech Connect

    Freitag, L. A.; Loy, R. M.

    1999-06-10

    The interactive visualization and exploration of large scientific data sets is a challenging and difficult task; their size often far exceeds the performance and memory capacity of even the most powerful graphics work-stations. To address this problem, we have created a technique that combines hierarchical data reduction methods with parallel computing to allow interactive exploration of large data sets while retaining full-resolution capability. The hierarchical representation is built in parallel by strategically inserting field data into an octree data structure. We provide functionality that allows the user to interactively adapt the resolution of the reduced data sets so that resolution is increased in regions of interest without sacrificing local graphics performance. We describe the creation of the reduced data sets using a parallel octree, the software architecture of the system, and the performance of this system on the data from a Rayleigh-Taylor instability simulation.

  16. Parallel job scheduling policies to improve fairness : a case study.

    SciTech Connect

    Leung, Vitus Joseph; Sabin, Gerald; Sadayappan, Ponnuswamy

    2008-02-01

    Balancing fairness, user performance, and system performance is a critical concern when developing and installing parallel schedulers. Sandia uses a customized scheduler to manage many of their parallel machines. A primary function of the scheduler is to ensure that the machines have good utilization and that users are treated in a 'fair' manner. A separate compute process allocator (CPA) ensures that the jobs on the machines are not too fragmented in order to maximize throughput. Until recently, there has been no established technique to measure the fairness of parallel job schedulers. This paper introduces a 'hybrid' fairness metric that is similar to recently proposed metrics. The metric uses the Sandia version of a 'fairshare' queuing priority as the basis for fairness. The hybrid fairness metric is used to evaluate a Sandia workload. Using these results, multiple scheduling strategies are introduced to improve performance while satisfying user and system performance constraints.

  17. Parallel iterative solvers and preconditioners using approximate hierarchical methods

    SciTech Connect

    Grama, A.; Kumar, V.; Sameh, A.

    1996-12-31

    In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.

  18. Parallel garbage collection on a virtual memory system

    SciTech Connect

    Abraham, S.G.; Patel, J.H.

    1987-01-01

    Since most artificial intelligence applications are programmed in list processing languages, it is important to design architectures to support efficient garbage collection. This paper presents an architecture and an associated algorithm for parallel garbage collection on a virtual memory system. All the previously proposed parallel algorithms attempt to collect cells released by the list processor during the garbage collection cycle. We do not attempt to collect such cells. As a consequence, the list processor incurs little overhead in the proposed scheme, since it need not synchronize with the collector. Most parallel algorithms are designed for shared memory machines which have certain implicit synchronization functions on variable access. The proposed algorithm is designed for virtual memory systems where both the list processor and the garbage collector have private memories. The enforcement of coherence between the two private memories can be expensive and is not necessary in our scheme. 15 refs., 3 figs.

  19. QCMPI: A parallel environment for quantum computing

    NASA Astrophysics Data System (ADS)

    Tabakin, Frank; Juliá-Díaz, Bruno

    2009-06-01

    QCMPI is a quantum computer (QC) simulation package written in Fortran 90 with parallel processing capabilities. It is an accessible research tool that permits rapid evaluation of quantum algorithms for a large number of qubits and for various "noise" scenarios. The prime motivation for developing QCMPI is to facilitate numerical examination of not only how QC algorithms work, but also to include noise, decoherence, and attenuation effects and to evaluate the efficacy of error correction schemes. The present work builds on an earlier Mathematica code QDENSITY, which is mainly a pedagogic tool. In that earlier work, although the density matrix formulation was featured, the description using state vectors was also provided. In QCMPI, the stress is on state vectors, in order to employ a large number of qubits. The parallel processing feature is implemented by using the Message-Passing Interface (MPI) protocol. A description of how to spread the wave function components over many processors is provided, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors. These operators include the standard Pauli, Hadamard, CNOT and CPHASE gates and also Quantum Fourier transformation. These operators make up the actions needed in QC. Codes for Grover's search and Shor's factoring algorithms are provided as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to alternate noise effects, which corresponds to the idea of solving a stochastic Schrödinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its eigenvalues and associated entropy. Potential applications of this powerful tool include studies of the stability and correction of QC processes using Hamiltonian based dynamics. Program summaryProgram title: QCMPI Catalogue identifier: AECS_v1_0 Program summary URL

  20. PREFACE: INERA Workshop: Transition Metal Oxide Thin Films-functional Layers in "Smart windows" and Water Splitting Devices. Parallel session of the 18th International School on Condensed Matter Physics

    NASA Astrophysics Data System (ADS)

    2014-11-01

    The Special issue presents the papers for the INERA Workshop entitled "Transition Metal Oxides as Functional Layers in Smart windows and Water Splitting Devices", which was held in Varna, St. Konstantin and Elena, Bulgaria, from the 4th-6th September 2014. The Workshop is organized within the context of the INERA "Research and Innovation Capacity Strengthening of ISSP-BAS in Multifunctional Nanostructures", FP7 Project REGPOT 316309 program, European project of the Institute of Solid State Physics at the Bulgarian Academy of Sciences. There were 42 participants at the workshop, 16 from Sweden, Germany, Romania and Hungary, 11 invited lecturers, and 28 young participants. There were researchers present from prestigious European laboratories which are leaders in the field of transition metal oxide thin film technologies. The event contributed to training young researchers in innovative thin film technologies, as well as thin films characterization techniques. The topics of the Workshop cover the field of technology and investigation of thin oxide films as functional layers in "Smart windows" and "Water splitting" devices. The topics are related to the application of novel technologies for the preparation of transition metal oxide films and the modification of chromogenic properties towards the improvement of electrochromic and termochromic device parameters for possible industrial deployment. The Workshop addressed the following topics: Metal oxide films-functional layers in energy efficient devices; Photocatalysts and chemical sensing; Novel thin film technologies and applications; Methods of thin films characterizations; From the 37 abstracts sent, 21 manuscripts were written and later refereed. We appreciate the comments from all the referees, and we are grateful for their valuable contributions. Guest Editors: Assoc. Prof. Dr.Tatyana Ivanova Prof. DSc Kostadinka Gesheva Prof. DSc Hassan Chamatti Assoc. Prof. Dr. Georgi Popkirov Workshop Organizing Committee Prof

  1. Analysis of multigrid methods on massively parallel computers: Architectural implications

    NASA Technical Reports Server (NTRS)

    Matheson, Lesley R.; Tarjan, Robert E.

    1993-01-01

    We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.

  2. A parallel coupled oceanic-atmospheric general circulation model

    SciTech Connect

    Wehner, M.F.; Bourgeois, A.J.; Eltgroth, P.G.; Duffy, P.B.; Dannevik, W.P.

    1994-12-01

    The Climate Systems Modeling group at LLNL has developed a portable coupled oceanic-atmospheric general circulation model suitable for use on a variety of massively parallel (MPP) computers of the multiple instruction, multiple data (MIMD) class. The model is composed of parallel versions of the UCLA atmospheric general circulation model, the GFDL modular ocean model (MOM) and a dynamic sea ice model based on the Hiber formulation extracted from the OPYC ocean model. The strategy to achieve parallelism is twofold. One level of parallelism is accomplished by applying two dimensional domain decomposition techniques to each of the three constituent submodels. A second level of parallelism is attained by a concurrent execution of AGCM and OGCM/sea ice components on separate sets of processors. For this functional decomposition scheme, a flux coupling module has been written to calculate the heat, moisture and momentum fluxes independent of either the AGCM or the OGCM modules. The flux coupler`s other roles are to facilitate the transfer of data between subsystem components and processors via message passing techniques and to interpolate and aggregate between the possibly incommensurate meshes.

  3. A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

    DOE PAGESBeta

    Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; Shephard, Mark S.

    2013-01-01

    Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order inmore » a 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less

  4. Non-intrusive parallelization of multibody system dynamic simulations

    NASA Astrophysics Data System (ADS)

    González, Francisco; Luaces, Alberto; Lugrís, Urbano; González, Manuel

    2009-09-01

    This paper evaluates two non-intrusive parallelization techniques for multibody system dynamics: parallel sparse linear equation solvers and OpenMP. Both techniques can be applied to existing simulation software with minimal changes in the code structure; this is a major advantage over Message Passing Interface, the standard parallelization method in multibody dynamics. Both techniques have been applied to parallelize a starting sequential implementation of a global index-3 augmented Lagrangian formulation combined with the trapezoidal rule as numerical integrator, in order to solve the forward dynamics of a variable-loop four-bar mechanism. Numerical experiments have been performed to measure the efficiency as a function of problem size and matrix filling. Results show that the best parallel solver (Pardiso) performs better than the best sequential solver (CHOLMOD) for multibody problems of large and medium sizes leading to matrix fillings above 10. OpenMP also proved to be advantageous even for problems of small sizes. Both techniques delivered speedups above 70% of the maximum theoretical values for a wide range of multibody problems.

  5. FermiQCD: A tool kit for parallel lattice QCD applications

    SciTech Connect

    Di Pierro, M.

    2002-03-01

    We present here the most recent version of FermiQCD, a collection of C++ classes, functions and parallel algorithms for lattice QCD, based on Matrix Distributed Processing. FermiQCD allows fast development of parallel lattice applications and includes some SSE2 optimizations for clusters of Pentium 4 PCs.

  6. APPSPACK 4.0 : asynchronous parallel pattern search for derivative-free optimization.

    SciTech Connect

    Gray, Genetha Anne; Kolda, Tamara Gibson

    2004-12-01

    APPSPACK is software for solving unconstrained and bound constrained optimization problems. It implements an asynchronous parallel pattern search method that has been specifically designed for problems characterized by expensive function evaluations. Using APPSPACK to solve optimization problems has several advantages: No derivative information is needed; the procedure for evaluating the objective function can be executed via a separate program or script; the code can be run in serial or parallel, regardless of whether or not the function evaluation itself is parallel; and the software is freely available. We describe the underlying algorithm, data structures, and features of APPSPACK version 4.0 as well as how to use and customize the software.

  7. Embodied and Distributed Parallel DJing.

    PubMed

    Cappelen, Birgitta; Andersson, Anders-Petter

    2016-01-01

    Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things.

  8. Parallel network simulations with NEURON.

    PubMed

    Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

    2006-10-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.

  9. Embodied and Distributed Parallel DJing.

    PubMed

    Cappelen, Birgitta; Andersson, Anders-Petter

    2016-01-01

    Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things. PMID:27534347

  10. Nanocapillary Adhesion between Parallel Plates.

    PubMed

    Cheng, Shengfeng; Robbins, Mark O

    2016-08-01

    Molecular dynamics simulations are used to study capillary adhesion from a nanometer scale liquid bridge between two parallel flat solid surfaces. The capillary force, Fcap, and the meniscus shape of the bridge are computed as the separation between the solid surfaces, h, is varied. Macroscopic theory predicts the meniscus shape and the contribution of liquid/vapor interfacial tension to Fcap quite accurately for separations as small as two or three molecular diameters (1-2 nm). However, the total capillary force differs in sign and magnitude from macroscopic theory for h ≲ 5 nm (8-10 diameters) because of molecular layering that is not included in macroscopic theory. For these small separations, the pressure tensor in the fluid becomes anisotropic. The components in the plane of the surface vary smoothly and are consistent with theory based on the macroscopic surface tension. Capillary adhesion is affected by only the perpendicular component, which has strong oscillations as the molecular layering changes.

  11. Parallel spinors on flat manifolds

    NASA Astrophysics Data System (ADS)

    Sadowski, Michał

    2006-05-01

    Let p(M) be the dimension of the vector space of parallel spinors on a closed spin manifold M. We prove that every finite group G is the holonomy group of a closed flat spin manifold M(G) such that p(M(G))>0. If the holonomy group Hol(M) of M is cyclic, then we give an explicit formula for p(M) another than that given in [R.J. Miatello, R.A. Podesta, The spectrum of twisted Dirac operators on compact flat manifolds, Trans. Am. Math. Soc., in press]. We answer the question when p(M)>0 if Hol(M) is a cyclic group of prime order or dim⁡M≤4.

  12. Information hiding in parallel programs

    SciTech Connect

    Foster, I.

    1992-01-30

    A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.

  13. Self-testing in parallel

    NASA Astrophysics Data System (ADS)

    McKague, Matthew

    2016-04-01

    Self-testing allows us to determine, through classical interaction only, whether some players in a non-local game share particular quantum states. Most work on self-testing has concentrated on developing tests for small states like one pair of maximally entangled qubits, or on tests where there is a separate player for each qubit, as in a graph state. Here we consider the case of testing many maximally entangled pairs of qubits shared between two players. Previously such a test was shown where testing is sequential, i.e., one pair is tested at a time. Here we consider the parallel case where all pairs are tested simultaneously, giving considerably more power to dishonest players. We derive sufficient conditions for a self-test for many maximally entangled pairs of qubits shared between two players and also two constructions for self-tests where all pairs are tested simultaneously.

  14. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1991-01-01

    The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.

  15. Nanocapillary Adhesion between Parallel Plates.

    PubMed

    Cheng, Shengfeng; Robbins, Mark O

    2016-08-01

    Molecular dynamics simulations are used to study capillary adhesion from a nanometer scale liquid bridge between two parallel flat solid surfaces. The capillary force, Fcap, and the meniscus shape of the bridge are computed as the separation between the solid surfaces, h, is varied. Macroscopic theory predicts the meniscus shape and the contribution of liquid/vapor interfacial tension to Fcap quite accurately for separations as small as two or three molecular diameters (1-2 nm). However, the total capillary force differs in sign and magnitude from macroscopic theory for h ≲ 5 nm (8-10 diameters) because of molecular layering that is not included in macroscopic theory. For these small separations, the pressure tensor in the fluid becomes anisotropic. The components in the plane of the surface vary smoothly and are consistent with theory based on the macroscopic surface tension. Capillary adhesion is affected by only the perpendicular component, which has strong oscillations as the molecular layering changes. PMID:27413872

  16. Parallel Performance Characterization of Columbia

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak

    2004-01-01

    Using a collection of benchmark problems of increasing levels of realism and computational effort, we will characterize the strengths and limitations of the 10,240 processor Columbia system to deliver supercomputing value to application scientists. Scientists need to be able to determine if and how they can utilize Columbia to carry extreme workloads, either in terms of ultra-large applications that cannot be run otherwise (capability), or in terms of very large ensembles of medium-scale applications to populate response matrices (capacity). We select existing application benchmarks that scale from a small number of processors to the entire machine, and that highlight different issues in running supercomputing-calss applicaions, such as the various types of memory access, file I/O, inter- and intra-node communications and parallelization paradigms. http://www.nas.nasa.gov/Software/NPB/

  17. A parallel dipole line system

    NASA Astrophysics Data System (ADS)

    Gunawan, Oki; Virgus, Yudistira; Tai, Kong Fai

    2015-02-01

    We present a study of a parallel linear distribution of dipole system, which can be realized using a pair of cylindrical diametric magnets and yields several interesting properties and applications. The system serves as a trap for cylindrical diamagnetic object, produces a fascinating one-dimensional camelback potential profile at its center plane, yields a technique for measuring magnetic susceptibility of the trapped object and serves as an ideal system to implement highly sensitive Hall measurement utilizing rotating magnetic field and lock-in detection. The latter application enables extraction of low carrier mobility in several materials of high interest such as the world-record-quality, earth abundant kesterite solar cell, and helps elucidate its fundamental performance limitation.

  18. Device for balancing parallel strings

    DOEpatents

    Mashikian, Matthew S.

    1985-01-01

    A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.

  19. Parallel computing in enterprise modeling.

    SciTech Connect

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  20. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated

  1. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2014-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  2. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2013-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Preliminary results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  3. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2015-11-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

  4. Development of parallel wire regenerator for cryocoolers

    NASA Astrophysics Data System (ADS)

    Nam, Kwanwoo; Jeong, Sangkwon

    2006-04-01

    This paper describes development of a novel regenerator geometry for cryocoolers. Parallel wire type is a wire bundle stacked in parallel with the flow in the housing, which is similar to a conventional parallel plate or tube. Simple and unique fabrication procedure is developed and fully depicted in this paper. Hydrodynamic and thermal experiments are performed to demonstrate the feasibility of the parallel wire regenerator. First, pressure drop characteristic of the parallel wire regenerator is compared to that of the screen mesh regenerator. Experimental result shows that the steady flow friction factor of the parallel wire type is three to five times smaller than that of the screen mesh type. Second, thermal ineffectiveness is determined by measuring the instantaneous pressure, the flow rate and the gas temperature at the warm and cold ends of the regenerator. The measured ineffectiveness of the parallel wire regenerator is larger than that of the screen regenerator due to the excessive axial conduction loss. To alleviate the intrinsic axial conduction loss of the parallel wire regenerator, segmentation is introduced and the experimental results reveal the favorable effect of the segmentation. Entropy generation calculation is adopted to compare the total losses between the screen regenerator and the parallel wire regenerator for various operating ranges. Simulation results show that the parallel wire regenerator can be an attractive candidate to improve cryocooler performance especially for the case of smaller NTU and lower cold-end temperature.

  5. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  6. The ParaScope parallel programming environment

    NASA Technical Reports Server (NTRS)

    Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.

    1993-01-01

    The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.

  7. Parallel universes of Black Six biology

    PubMed Central

    2014-01-01

    Creation of lethal and synthetic lethal mutations in an experimental organism is a cornerstone of genetic dissection of gene function, and is related to the concept of an essential gene. Common inbred mouse strains carry background mutations, which can act as genetic modifiers, interfering with the assignment of gene essentiality. The inbred strain C57BL/6J, commonly known as “Black Six”, stands out, as it carries a spontaneous homozygous deletion in the nicotinamide nucleotide transhydrogenase (Nnt) gene [GenBank: AH009385.2], resulting in impairment of steroidogenic mitochondria of the adrenal gland, and a multitude of indirect modifier effects, coming from alteration of glucocorticoid-regulated processes. Over time, the popular strain has been used, by means of gene targeting technology, to assign “essential” and “redundant” qualifiers to numerous genes, thus creating an internally consistent “parallel universe” of knowledge. It is unrealistic to suggest phasing-out of this strain, given the scope of shared resources built around it, however, continuing on the road of “strain-unawareness” will result in profound waste of effort, particularly where translational research is concerned. The review analyzes the historical roots of this phenomenon and proposes that building of “parallel universes” should be urgently made visible to a critical reader by obligatory use of unambiguous and persistent tags in publications and databases, such as hypertext links, pointing to a vendor’s strain description web page, or to a digital object identifier (d.o.i.) of the original publication, so that any research done exclusively in C57BL/6J, could be easily identified. Reviewers This article was reviewed by Dr. Neil Smalheiser and Dr. Miguel Andrade-Navarro. PMID:25038798

  8. Parallel CARLOS-3D code development

    SciTech Connect

    Putnam, J.M.; Kotulski, J.D.

    1996-02-01

    CARLOS-3D is a three-dimensional scattering code which was developed under the sponsorship of the Electromagnetic Code Consortium, and is currently used by over 80 aerospace companies and government agencies. The code has been extensively validated and runs on both serial workstations and parallel super computers such as the Intel Paragon. CARLOS-3D is a three-dimensional surface integral equation scattering code based on a Galerkin method of moments formulation employing Rao- Wilton-Glisson roof-top basis for triangular faceted surfaces. Fully arbitrary 3D geometries composed of multiple conducting and homogeneous bulk dielectric materials can be modeled. This presentation describes some of the extensions to the CARLOS-3D code, and how the operator structure of the code facilitated these improvements. Body of revolution (BOR) and two-dimensional geometries were incorporated by simply including new input routines, and the appropriate Galerkin matrix operator routines. Some additional modifications were required in the combined field integral equation matrix generation routine due to the symmetric nature of the BOR and 2D operators. Quadrilateral patched surfaces with linear roof-top basis functions were also implemented in the same manner. Quadrilateral facets and triangular facets can be used in combination to more efficiently model geometries with both large smooth surfaces and surfaces with fine detail such as gaps and cracks. Since the parallel implementation in CARLOS-3D is at high level, these changes were independent of the computer platform being used. This approach minimizes code maintenance, while providing capabilities with little additional effort. Results are presented showing the performance and accuracy of the code for some large scattering problems. Comparisons between triangular faceted and quadrilateral faceted geometry representations will be shown for some complex scatterers.

  9. Isolation, incubation, and parallel functional testing and identification by FISH of rare microbial single-copy cells from multi-species mixtures using the combination of chemistrode and stochastic confinement.

    PubMed

    Liu, Weishan; Kim, Hyun Jung; Lucchetta, Elena M; Du, Wenbin; Ismagilov, Rustem F

    2009-08-01

    This paper illustrates a plug-based microfluidic approach combining the technique of the chemistrode and the principle of stochastic confinement, which can be used to i) starting from a mixture of cells, stochastically isolate single cells into plugs, ii) incubate the plugs to grow clones of the individual cells without competition among different clones, iii) split the plugs into arrays of identical daughter plugs, where each plug contained clones of the original cell, and iv) analyze each array by an independent technique, including cellulase assays, cultivation, cryo-preservation, Gram staining, and Fluorescence In Situ Hybridization (FISH). Functionally, this approach is equivalent to simultaneously assaying the clonal daughter cells by multiple killing and non-killing methods. A new protocol for single-cell FISH, a killing method, was developed to identify isolated cells of Paenibacillus curdlanolyticus in one array of daughter plugs using a 16S rRNA probe, Pc196. At the same time, live copies of P. curdlanolyticus in another array were obtained for cultivation. Among technical advances, this paper reports a chemistrode that enables sampling of nanoliter volumes directly from environmental specimens, such as soil slurries. In addition, a method for analyzing plugs is described: an array of droplets is deposited on the surface, and individual plugs are injected into the droplets of the surface array to induce a reaction and enable microscopy without distortions associated with curvature of plugs. The overall approach is attractive for identifying rare, slow growing microorganisms and would complement current methods to cultivate unculturable microbes from environmental samples.

  10. Measurements of parallel electron velocity distributions using whistler wave absorption

    SciTech Connect

    Thuecks, D. J.; Skiff, F.; Kletzing, C. A.

    2012-08-15

    We describe a diagnostic to measure the parallel electron velocity distribution in a magnetized plasma that is overdense ({omega}{sub pe} > {omega}{sub ce}). This technique utilizes resonant absorption of whistler waves by electrons with velocities parallel to a background magnetic field. The whistler waves were launched and received by a pair of dipole antennas immersed in a cylindrical discharge plasma at two positions along an axial background magnetic field. The whistler wave frequency was swept from somewhat below and up to the electron cyclotron frequency {omega}{sub ce}. As the frequency was swept, the wave was resonantly absorbed by the part of the electron phase space density which was Doppler shifted into resonance according to the relation {omega}-k{sub ||v||} = {omega}{sub ce}. The measured absorption is directly related to the reduced parallel electron distribution function integrated along the wave trajectory. The background theory and initial results from this diagnostic are presented here. Though this diagnostic is best suited to detect tail populations of the parallel electron distribution function, these first results show that this diagnostic is also rather successful in measuring the bulk plasma density and temperature both during the plasma discharge and into the afterglow.

  11. Berkeley Unified Parallel C (UPC) Runtime Library

    2003-03-31

    This software comprises a portable, open source implementation of a runtime library to support applications written in the Unified Parallel C (UPC) language. This library implements the UPC-specific functionality, including shared memory allocation and locks. The network-dependent functionality is implemented as a thin wrapper around a separate library implementing the GASNet (Global-Address Space Networking) specification. For true shared memory machines. GASNet is bypassed in favor of direct memory operations and local synchronization mechanisms. The Berkeleymore » UPC Runtime Library is currently the only implementation of the "Berkeley UPC Runtime Specification", and thus the only runtme library usable with the Berkeley UPC Compiler. Also, it is the only UPC runtime known to the author to provide two shared pointer representations: one for arbitrary blocksizes and one to optimize for the common cases of phaseless and blocksize=1. For distributed memory environments a library implementing the GASNet (Global-Address Space Networking) specification is required for communication. While no specialized hardware is required, a high-speed interconnet supported by the GASNet implementation is suggested for preformance. If no supported high-speed interconnect is available. GASNet can run over MPI. An external library is reqired for certain local memory allocation operations. A well defined interface allows for multiple implementations of this library, but at present the "umalloc" library from LBNL is the only compatible implementation.« less

  12. Parallel Quantum Circuit in a Tunnel Junction.

    PubMed

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-01-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N(2) power law is preserved for Ωab(N) and for Vab(N). PMID:27453262

  13. Parallel Quantum Circuit in a Tunnel Junction

    NASA Astrophysics Data System (ADS)

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-07-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N).

  14. Parallel Quantum Circuit in a Tunnel Junction.

    PubMed

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-07-25

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N(2) power law is preserved for Ωab(N) and for Vab(N).

  15. Parallel Quantum Circuit in a Tunnel Junction

    PubMed Central

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-01-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N). PMID:27453262

  16. Multi-petascale highly efficient parallel supercomputer

    DOEpatents

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  17. Towards Distributed Memory Parallel Program Analysis

    SciTech Connect

    Quinlan, D; Barany, G; Panas, T

    2008-06-17

    This paper presents a parallel attribute evaluation for distributed memory parallel computer architectures where previously only shared memory parallel support for this technique has been developed. Attribute evaluation is a part of how attribute grammars are used for program analysis within modern compilers. Within this work, we have extended ROSE, a open compiler infrastructure, with a distributed memory parallel attribute evaluation mechanism to support user defined global program analysis required for some forms of security analysis which can not be addressed by a file by file view of large scale applications. As a result, user defined security analyses may now run in parallel without the user having to specify the way data is communicated between processors. The automation of communication enables an extensible open-source parallel program analysis infrastructure.

  18. Parallel reactor systems for bioprocess development.

    PubMed

    Weuster-Botz, Dirk

    2005-01-01

    Controlled parallel bioreactor systems allow fed-batch operation at early stages of process development. The characteristics of shaken bioreactors operated in parallel (shake flask, microtiter plate), sparged bioreactors (small-scale bubble column) and stirred bioreactors (stirred-tank, stirred column) are briefly summarized. Parallel fed-batch operation is achieved with an intermittent feeding and pH-control system for up to 16 bioreactors operated in parallel on a scale of 100 ml. Examples of the scale-up and scale-down of pH-controlled microbial fed-batch processes demonstrate that controlled parallel reactor systems can result in more effective bioprocess development. Future developments are also outlined, including units of 48 parallel stirred-tank reactors with individual pH- and pO2-controls and automation as well as liquid handling system, operated on a scale of ml.

  19. Design considerations for parallel graphics libraries

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  20. A generic fine-grained parallel C

    NASA Technical Reports Server (NTRS)

    Hamet, L.; Dorband, John E.

    1988-01-01

    With the present availability of parallel processors of vastly different architectures, there is a need for a common language interface to multiple types of machines. The parallel C compiler, currently under development, is intended to be such a language. This language is based on the belief that an algorithm designed around fine-grained parallelism can be mapped relatively easily to different parallel architectures, since a large percentage of the parallelism has been identified. The compiler generates a FORTH-like machine-independent intermediate code. A machine-dependent translator will reside on each machine to generate the appropriate executable code, taking advantage of the particular architectures. The goal of this project is to allow a user to run the same program on such machines as the Massively Parallel Processor, the CRAY, the Connection Machine, and the CYBER 205 as well as serial machines such as VAXes, Macintoshes and Sun workstations.

  1. Running Geant on T. Node parallel computer

    SciTech Connect

    Jejcic, A.; Maillard, J.; Silva, J. ); Mignot, B. )

    1990-08-01

    AnInmos transputer-based computer has been utilized to overcome the difficulties due to the limitations on the processing abilities of event parallelism and multiprocessor farms (i.e., the so called bus-crisis) and the concern regarding the growing sizes of databases typical in High Energy Physics. This study was done on the T.Node parallel computer manufactured by TELMAT. Detailed figures are reported concerning the event parallelization. (AIP)

  2. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  3. Parallel computations and control of adaptive structures

    NASA Technical Reports Server (NTRS)

    Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

    1991-01-01

    The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.

  4. Parallel Genetic Algorithm for Alpha Spectra Fitting

    NASA Astrophysics Data System (ADS)

    García-Orellana, Carlos J.; Rubio-Montero, Pilar; González-Velasco, Horacio

    2005-01-01

    We present a performance study of alpha-particle spectra fitting using parallel Genetic Algorithm (GA). The method uses a two-step approach. In the first step we run parallel GA to find an initial solution for the second step, in which we use Levenberg-Marquardt (LM) method for a precise final fit. GA is a high resources-demanding method, so we use a Beowulf cluster for parallel simulation. The relationship between simulation time (and parallel efficiency) and processors number is studied using several alpha spectra, with the aim of obtaining a method to estimate the optimal processors number that must be used in a simulation.

  5. Parallel auto-correlative statistics with VTK.

    SciTech Connect

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  6. A parallel algorithm for global routing

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall J.; Banerjee, Prithviraj

    1990-01-01

    A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.

  7. Data-parallel algorithms for image computing

    NASA Astrophysics Data System (ADS)

    Carlotto, Mark J.

    1990-11-01

    Data-parallel algorithms for image computing on the Connection Machine are described. After a brief review of some basic programming concepts in *Lip, a parallel extension of Common Lisp, data-parallel programming paradigms based on a local (diffusion-like) model of computation, the scan model of computation, a general interprocessor communications model, and a region-based model are introduced. Algorithms for connected component labeling, distance transformation, Voronoi diagrams, finding minimum cost paths, local means, shape-from-shading, hidden surface calculations, affine transformation, oblique parallel projection, and spatial operations over regions are presented. An new algorithm for interpolating irregularly spaced data via Voronoi diagrams is also described.

  8. Analysis of the numerical effects of parallelism on a parallel genetic algorithm

    SciTech Connect

    Hart, W.E.; Belew, R.K.; Kohn, S.; Baden, S.

    1995-09-18

    This paper examines the effects of relaxed synchronization on both the numerical and parallel efficiency of parallel genetic algorithms (GAs). We describe a coarse-grain geographically structured parallel genetic algorithm. Our experiments show that asynchronous versions of these algorithms have a lower run time than-synchronous GAs. Furthermore, we demonstrate that this improvement in performance is partly due to the fact that the numerical efficiency of the asynchronous genetic algorithm is better than the synchronous genetic algorithm. Our analysis includes a critique of the utility of traditional parallel performance measures for parallel GAs, and we evaluate the claims made by several researchers that parallel GAs can have superlinear speedup.

  9. Grouping through local, parallel interactions

    NASA Astrophysics Data System (ADS)

    Proesmans, Marc; Van Gool, Luc J.; Oosterlinck, Andre J.

    1995-08-01

    This paper describes a new approach for computer based visual grouping. A number of computational principles are defined related to results on neurophysiological and psychophysical experiments. The grouping principles have been subdivided into two groups. The 'first-order processes' perform local operations on 'basic' features such as luminance, color, and orientation. 'Second-order processes' consider bilocal interactions (stereo, optical flow, texture, symmetry). The computational scheme developed in this paper relies on the solution of a set of nonlinear differential equations. They are referred to as 'coupled diffusion maps'. Such systems obey the prescribed computational principles. Several maps, corresponding to different features, evolve in parallel, while all computations within and between the maps are localized in a small neighborhood. Moreover, interactions between maps are bidirectional and retinotopically organized, features also underlying processing by the human visual system. Within this framework, new techniques are proposed and developed for e.g. the segmentation of oriented textures, stereo analysis, optical flow detection, etc. Experiments show that the underlying algorithms prove to be successful for first-order as well as second-order grouping processes and show the promising possiblities such a framework can offer for a large number of low-level vision tasks.

  10. Vectoring of parallel synthetic jets

    NASA Astrophysics Data System (ADS)

    Berk, Tim; Ganapathisubramani, Bharathram; Gomit, Guillaume

    2015-11-01

    A pair of parallel synthetic jets can be vectored by applying a phase difference between the two driving signals. The resulting jet can be merged or bifurcated and either vectored towards the actuator leading in phase or the actuator lagging in phase. In the present study, the influence of phase difference and Strouhal number on the vectoring behaviour is examined experimentally. Phase-locked vorticity fields, measured using Particle Image Velocimetry (PIV), are used to track vortex pairs. The physical mechanisms that explain the diversity in vectoring behaviour are observed based on the vortex trajectories. For a fixed phase difference, the vectoring behaviour is shown to be primarily influenced by pinch-off time of vortex rings generated by the synthetic jets. Beyond a certain formation number, the pinch-off timescale becomes invariant. In this region, the vectoring behaviour is determined by the distance between subsequent vortex rings. We acknowledge the financial support from the European Research Council (ERC grant agreement no. 277472).

  11. Parallel processing in immune networks

    NASA Astrophysics Data System (ADS)

    Agliari, Elena; Barra, Adriano; Bartolucci, Silvia; Galluzzi, Andrea; Guerra, Francesco; Moauro, Francesco

    2013-04-01

    In this work, we adopt a statistical-mechanics approach to investigate basic, systemic features exhibited by adaptive immune systems. The lymphocyte network made by B cells and T cells is modeled by a bipartite spin glass, where, following biological prescriptions, links connecting B cells and T cells are sparse. Interestingly, the dilution performed on links is shown to make the system able to orchestrate parallel strategies to fight several pathogens at the same time; this multitasking capability constitutes a remarkable, key property of immune systems as multiple antigens are always present within the host. We also define the stochastic process ruling the temporal evolution of lymphocyte activity and show its relaxation toward an equilibrium measure allowing statistical-mechanics investigations. Analytical results are compared with Monte Carlo simulations and signal-to-noise outcomes showing overall excellent agreement. Finally, within our model, a rationale for the experimentally well-evidenced correlation between lymphocytosis and autoimmunity is achieved; this sheds further light on the systemic features exhibited by immune networks.

  12. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1995-01-01

    The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.

  13. Particle Transport in Parallel-Plate Reactors

    SciTech Connect

    Rader, D.J.; Geller, A.S.

    1999-08-01

    A major cause of semiconductor yield degradation is contaminant particles that deposit on wafers while they reside in processing tools during integrated circuit manufacturing. This report presents numerical models for assessing particle transport and deposition in a parallel-plate geometry characteristic of a wide range of single-wafer processing tools: uniform downward flow exiting a perforated-plate showerhead separated by a gap from a circular wafer resting on a parallel susceptor. Particles are assumed to originate either upstream of the showerhead or from a specified position between the plates. The physical mechanisms controlling particle deposition and transport (inertia, diffusion, fluid drag, and external forces) are reviewed, with an emphasis on conditions encountered in semiconductor process tools (i.e., sub-atmospheric pressures and submicron particles). Isothermal flow is assumed, although small temperature differences are allowed to drive particle thermophoresis. Numerical solutions of the flow field are presented which agree with an analytic, creeping-flow expression for Re < 4. Deposition is quantified by use of a particle collection efficiency, which is defined as the fraction of particles in the reactor that deposit on the wafer. Analytic expressions for collection efficiency are presented for the limiting case where external forces control deposition (i.e., neglecting particle diffusion and inertia). Deposition from simultaneous particle diffusion and external forces is analyzed by an Eulerian formulation; for creeping flow and particles released from a planar trap, the analysis yields an analytic, integral expression for particle deposition based on process and particle properties. Deposition from simultaneous particle inertia and external forces is analyzed by a Lagrangian formulation, which can describe inertia-enhanced deposition resulting from particle acceleration in the showerhead. An approximate analytic expression is derived for particle

  14. A component analysis based on serial results analyzing performance of parallel iterative programs

    SciTech Connect

    Richman, S.C.

    1994-12-31

    This research is concerned with the parallel performance of iterative methods for solving large, sparse, nonsymmetric linear systems. Most of the iterative methods are first presented with their time costs and convergence rates examined intensively on sequential machines, and then adapted to parallel machines. The analysis of the parallel iterative performance is more complicated than that of serial performance, since the former can be affected by many new factors, such as data communication schemes, number of processors used, and Ordering and mapping techniques. Although the author is able to summarize results from data obtained after examining certain cases by experiments, two questions remain: (1) How to explain the results obtained? (2) How to extend the results from the certain cases to general cases? To answer these two questions quantitatively, the author introduces a tool called component analysis based on serial results. This component analysis is introduced because the iterative methods consist mainly of several basic functions such as linked triads, inner products, and triangular solves, which have different intrinsic parallelisms and are suitable for different parallel techniques. The parallel performance of each iterative method is first expressed as a weighted sum of the parallel performance of the basic functions that are the components of the method. Then, one separately examines the performance of basic functions and the weighting distributions of iterative methods, from which two independent sets of information are obtained when solving a given problem. In this component approach, all the weightings require only serial costs not parallel costs, and each iterative method for solving a given problem is represented by its unique weighting distribution. The information given by the basic functions is independent of iterative method, while that given by weightings is independent of parallel technique, parallel machine and number of processors.

  15. Parallel barcoding of antibodies for DNA-assisted proteomics.

    PubMed

    Dezfouli, Mahya; Vickovic, Sanja; Iglesias, Maria Jesus; Schwenk, Jochen M; Ahmadian, Afshin

    2014-11-01

    DNA-assisted proteomics technologies enable ultra-sensitive measurements in multiplex format using DNA-barcoded affinity reagents. Although numerous antibodies are available, nowadays targeting nearly the complete human proteome, the majority is not accessible at the quantity, concentration, or purity recommended for most bio-conjugation protocols. Here, we introduce a magnetic bead-assisted DNA-barcoding approach, applicable for several antibodies in parallel, as well as reducing required reagents quantities up to a thousand-fold. The success of DNA-barcoding and retained functionality of antibodies were demonstrated in sandwich immunoassays and standard quantitative Immuno-PCR assays. Specific DNA-barcoding of antibodies for multiplex applications was presented on suspension bead arrays with read-out on a massively parallel sequencing platform in a procedure denoted Immuno-Sequencing. Conclusively, human plasma samples were analyzed to indicate the functionality of barcoded antibodies in intended proteomics applications. PMID:25263329

  16. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  17. Open parabosonic string theory between two parallel Dp-branes

    SciTech Connect

    Hamam, D.; Belaloui, N.

    2012-06-27

    We investigate an open parabosonic string theory between two parallel Dp-branes. The spectrum is constructed and the partition function is derived. A common chord between the development of this latter and the degeneracy of the states for each mass level is obtained. The theory is consistent and with no tachyon. The Virasoro algebra is derived and compared to the one of the ordinary case.

  18. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  19. TSE computers - A means for massively parallel computations

    NASA Technical Reports Server (NTRS)

    Strong, J. P., III

    1976-01-01

    A description is presented of hardware concepts for building a massively parallel processing system for two-dimensional data. The processing system is to use logic arrays of 128 x 128 elements which perform over 16 thousand operations simultaneously. Attention is given to image data, logic arrays, basic image logic functions, a prototype negator, an interleaver device, image logic circuits, and an image memory circuit.

  20. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    NASA Astrophysics Data System (ADS)

    Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.

    2009-08-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  1. Serial Order: A Parallel Distributed Processing Approach.

    ERIC Educational Resources Information Center

    Jordan, Michael I.

    Human behavior shows a variety of serially ordered action sequences. This paper presents a theory of serial order which describes how sequences of actions might be learned and performed. In this theory, parallel interactions across time (coarticulation) and parallel interactions across space (dual-task interference) are viewed as two aspects of a…

  2. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele

    2001-01-01

    This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.

  3. Parallel processing of numerical transport algorithms

    SciTech Connect

    Wienke, B.R.; Hiromoto, R.E.

    1984-01-01

    The multigroup, discrete ordinates representation for the linear transport equation enjoys widespread computational use and popularity. Serial solution schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, we investigate the parallel structure and extension of a number of standard S/sub n/ approaches. Concurrent inner sweeps, coupled acceleration techniques, synchronized inner-outer loops, and chaotic iteration are described, and results of computations are contrasted. The multigroup representation and serial iteration methods are also detailed. The basic iterative S/sub n/ method lends itself to parallel tasking, portably affording an effective medium for performing transport calculations on future architectures. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. We find basic inner-outer and chaotic iteration strategies both easily support comparably high degrees of parallelism. Both accommodate parallel rebalance and diffusion acceleration and appear as robust and viable parallel techniques for S/sub n/ production work.

  4. Software For Diagnosis Of Parallel Processing

    NASA Technical Reports Server (NTRS)

    Hontalas, Philip; Yan, Jerry; Fineman, Charles

    1995-01-01

    Ames Instrumentation System (AIMS) computer program package of software tools measuring and analyzing performances of parallel-processing application programs. Helps programmer to debug and refine, and to monitor and visualize execution of, parallel-processing application software for Intel iPSC/860 (or equivalent) multicomputer. Performance data collected displayed graphically on computer workstations supporting X-Windows.

  5. Parallel unstructured grid generation for computational aerosciences

    NASA Technical Reports Server (NTRS)

    Shephard, Mark S.

    1993-01-01

    The objective of this research project is to develop efficient parallel automatic grid generation procedures for use in computational aerosciences. This effort is focused on a parallel version of the Finite Octree grid generator. Progress made during the first six months is reported.

  6. Parallel Activation in Bilingual Phonological Processing

    ERIC Educational Resources Information Center

    Lee, Su-Yeon

    2011-01-01

    In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…

  7. RAM-Based parallel-output controller

    NASA Technical Reports Server (NTRS)

    Niswander, J. K.; Stattel, R. J.

    1980-01-01

    Selected bit strings in serial-data link are extracted for processing. Controller is programmable interface between serial-data link and peripherals that accept parallel data. It can be used to drive displays, printers, plotters, digital-to-analog converters, and parallel-output ports.

  8. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  9. MULTIOBJECTIVE PARALLEL GENETIC ALGORITHM FOR WASTE MINIMIZATION

    EPA Science Inventory

    In this research we have developed an efficient multiobjective parallel genetic algorithm (MOPGA) for waste minimization problems. This MOPGA integrates PGAPack (Levine, 1996) and NSGA-II (Deb, 2000) with novel modifications. PGAPack is a master-slave parallel implementation of a...

  10. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  11. Parallel hypergraph partitioning for scientific computing.

    SciTech Connect

    Heaphy, Robert; Devine, Karen Dragon; Catalyurek, Umit; Bisseling, Robert; Hendrickson, Bruce Alan; Boman, Erik Gunnar

    2005-07-01

    Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster.

  12. Differences Between Distributed and Parallel Systems

    SciTech Connect

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  13. A paradigm for parallel unstructured grid generation

    SciTech Connect

    Gaither, A.; Marcum, D.; Reese, D.

    1996-12-31

    In this paper, a sequential 2D unstructured grid generator based on iterative point insertion and local reconnection is coupled with a Delauney tessellation domain decomposition scheme to create a scalable parallel unstructured grid generator. The Message Passing Interface (MPI) is used for distributed communication in the parallel grid generator. This work attempts to provide a generic framework to enable the parallelization of fast sequential unstructured grid generators in order to compute grand-challenge scale grids for Computational Field Simulation (CFS). Motivation for moving from sequential to scalable parallel grid generation is presented. Delaunay tessellation and iterative point insertion and local reconnection (advancing front method only) unstructured grid generation techniques are discussed with emphasis on how these techniques can be utilized for parallel unstructured grid generation. Domain decomposition techniques are discussed for both Delauney and advancing front unstructured grid generation with emphasis placed on the differences needed for both grid quality and algorithmic efficiency.

  14. Broadcasting a message in a parallel computer

    DOEpatents

    Berg, Jeremy E.; Faraj, Ahmad A.

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  15. Configuration space representation in parallel coordinates

    NASA Technical Reports Server (NTRS)

    Fiorini, Paolo; Inselberg, Alfred

    1989-01-01

    By means of a system of parallel coordinates, a nonprojective mapping from R exp N to R squared is obtained for any positive integer N. In this way multivariate data and relations can be represented in the Euclidean plane (embedded in the projective plane). Basically, R squared with Cartesian coordinates is augmented by N parallel axes, one for each variable. The N joint variables of a robotic device can be represented graphically by using parallel coordinates. It is pointed out that some properties of the relation are better perceived visually from the parallel coordinate representation, and that new algorithms and data structures can be obtained from this representation. The main features of parallel coordinates are described, and an example is presented of their use for configuration space representation of a mechanical arm (where Cartesian coordinates cannot be used).

  16. Genetic Parallel Programming: design and implementation.

    PubMed

    Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

    2006-01-01

    This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

  17. Parallel tempering for the traveling salesman problem

    SciTech Connect

    Percus, Allon; Wang, Richard; Hyman, Jeffrey; Caflisch, Russel

    2008-01-01

    We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulations are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.

  18. Randomized parallel speedups for list ranking

    SciTech Connect

    Vishkin, U.

    1987-06-01

    The following problem is considered: given a linked list of length n, compute the distance of each element of the linked list from the end of the list. The problem has two standard deterministic algorithms: a linear time serial algorithm, and an O(n log n)/ rho + log n) time parallel algorithm using rho processors. The authors present a randomized parallel algorithm for the problem. The algorithm is designed for an exclusive-read exclusive-write parallel random access machine (EREW PRAM). It runs almost surely in time O(n/rho + log n log* n) using rho processors. Using a recently published parallel prefix sums algorithm the list-ranking algorithm can be adapted to run on a concurrent-read concurrent-write parallel random access machine (CRCW PRAM) almost surely in time O(n/rho + log n) using rho processors.

  19. National Combustion Code: Parallel Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

    2000-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.

  20. Implementation and performance of parallel Prolog interpreter

    SciTech Connect

    Wei, S.; Kale, L.V.; Balkrishna, R. . Dept. of Computer Science)

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  1. Parallel Algebraic Multigrid Methods - High Performance Preconditioners

    SciTech Connect

    Yang, U M

    2004-11-11

    The development of high performance, massively parallel computers and the increasing demands of computationally challenging applications have necessitated the development of scalable solvers and preconditioners. One of the most effective ways to achieve scalability is the use of multigrid or multilevel techniques. Algebraic multigrid (AMG) is a very efficient algorithm for solving large problems on unstructured grids. While much of it can be parallelized in a straightforward way, some components of the classical algorithm, particularly the coarsening process and some of the most efficient smoothers, are highly sequential, and require new parallel approaches. This chapter presents the basic principles of AMG and gives an overview of various parallel implementations of AMG, including descriptions of parallel coarsening schemes and smoothers, some numerical results as well as references to existing software packages.

  2. p-MEMPSODE: Parallel and irregular memetic global optimization

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Parsopoulos, K. E.; Papageorgiou, D. G.; Lagaris, I. E.; Vrahatis, M. N.

    2015-12-01

    A parallel memetic global optimization algorithm suitable for shared memory multicore systems is proposed and analyzed. The considered algorithm combines two well-known and widely used population-based stochastic algorithms, namely Particle Swarm Optimization and Differential Evolution, with two efficient and parallelizable local search procedures. The sequential version of the algorithm was first introduced as MEMPSODE (MEMetic Particle Swarm Optimization and Differential Evolution) and published in the CPC program library. We exploit the inherent and highly irregular parallelism of the memetic global optimization algorithm by means of a dynamic and multilevel approach based on the OpenMP tasking model. In our case, tasks correspond to local optimization procedures or simple function evaluations. Parallelization occurs at each iteration step of the memetic algorithm without affecting its searching efficiency. The proposed implementation, for the same random seed, reaches the same solution irrespectively of being executed sequentially or in parallel. Extensive experimental evaluation has been performed in order to illustrate the speedup achieved on a shared-memory multicore server.

  3. A portable MPI-based parallel vector template library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  4. A Portable MPI-Based Parallel Vector Template Library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  5. Numerical analysis of electrical defibrillation. The parallel approach.

    PubMed

    Ng, K T; Hutchinson, S A; Gao, S

    1995-01-01

    Numerical modeling offers a viable tool for studying electrical defibrillation, allowing the behavior of field quantities to be observed easily as the different system parameters are varied. One numerical technique, namely the finite-element method, has been found particularly effective for modeling complex thoracic anatomies. However, an accurate finite-element model of the thorax often requires a large number of elements and nodes, leading to a large set of equations that cannot be solved effectively with the computational power of conventional computers. This is especially true if many finite-element solutions need to be achieved within a reasonable time period (eg, electrode configuration optimization). In this study, the use of massively parallel computers to provide the memory and reduction in solution time for solving these large finite-element problems is discussed. Both the uniform and unstructured grid approaches are considered. Algorithms that allow efficient mapping of uniform and unstructured grids to data-parallel and message-passing parallel computers are discussed. An automatic iterative procedure for electrode configuration optimization is presented. The procedure is based on the minimization of an objective function using the parallel direct search technique. Computational performance results are presented together with simulation results. PMID:8656104

  6. Parallel implementation of electronic structure energy, gradient, and Hessian calculations.

    PubMed

    Lotrich, V; Flocke, N; Ponton, M; Yau, A D; Perera, A; Deumens, E; Bartlett, R J

    2008-05-21

    ACES III is a newly written program in which the computationally demanding components of the computational chemistry code ACES II [J. F. Stanton et al., Int. J. Quantum Chem. 526, 879 (1992); [ACES II program system, University of Florida, 1994] have been redesigned and implemented in parallel. The high-level algorithms include Hartree-Fock (HF) self-consistent field (SCF), second-order many-body perturbation theory [MBPT(2)] energy, gradient, and Hessian, and coupled cluster singles, doubles, and perturbative triples [CCSD(T)] energy and gradient. For SCF, MBPT(2), and CCSD(T), both restricted HF and unrestricted HF reference wave functions are available. For MBPT(2) gradients and Hessians, a restricted open-shell HF reference is also supported. The methods are programed in a special language designed for the parallelization project. The language is called super instruction assembly language (SIAL). The design uses an extreme form of object-oriented programing. All compute intensive operations, such as tensor contractions and diagonalizations, all communication operations, and all input-output operations are handled by a parallel program written in C and FORTRAN 77. This parallel program, called the super instruction processor (SIP), interprets and executes the SIAL program. By separating the algorithmic complexity (in SIAL) from the complexities of execution on computer hardware (in SIP), a software system is created that allows for very effective optimization and tuning on different hardware architectures with quite manageable effort. PMID:18500853

  7. Parallel implementation of electronic structure energy, gradient, and Hessian calculations

    NASA Astrophysics Data System (ADS)

    Lotrich, V.; Flocke, N.; Ponton, M.; Yau, A. D.; Perera, A.; Deumens, E.; Bartlett, R. J.

    2008-05-01

    ACES III is a newly written program in which the computationally demanding components of the computational chemistry code ACES II [J. F. Stanton et al., Int. J. Quantum Chem. 526, 879 (1992); [ACES II program system, University of Florida, 1994] have been redesigned and implemented in parallel. The high-level algorithms include Hartree-Fock (HF) self-consistent field (SCF), second-order many-body perturbation theory [MBPT(2)] energy, gradient, and Hessian, and coupled cluster singles, doubles, and perturbative triples [CCSD(T)] energy and gradient. For SCF, MBPT(2), and CCSD(T), both restricted HF and unrestricted HF reference wave functions are available. For MBPT(2) gradients and Hessians, a restricted open-shell HF reference is also supported. The methods are programed in a special language designed for the parallelization project. The language is called super instruction assembly language (SIAL). The design uses an extreme form of object-oriented programing. All compute intensive operations, such as tensor contractions and diagonalizations, all communication operations, and all input-output operations are handled by a parallel program written in C and FORTRAN 77. This parallel program, called the super instruction processor (SIP), interprets and executes the SIAL program. By separating the algorithmic complexity (in SIAL) from the complexities of execution on computer hardware (in SIP), a software system is created that allows for very effective optimization and tuning on different hardware architectures with quite manageable effort.

  8. Parallel reservoir computing using optical amplifiers.

    PubMed

    Vandoorne, Kristof; Dambre, Joni; Verstraeten, David; Schrauwen, Benjamin; Bienstman, Peter

    2011-09-01

    Reservoir computing (RC), a computational paradigm inspired on neural systems, has become increasingly popular in recent years for solving a variety of complex recognition and classification problems. Thus far, most implementations have been software-based, limiting their speed and power efficiency. Integrated photonics offers the potential for a fast, power efficient and massively parallel hardware implementation. We have previously proposed a network of coupled semiconductor optical amplifiers as an interesting test case for such a hardware implementation. In this paper, we investigate the important design parameters and the consequences of process variations through simulations. We use an isolated word recognition task with babble noise to evaluate the performance of the photonic reservoirs with respect to traditional software reservoir implementations, which are based on leaky hyperbolic tangent functions. Our results show that the use of coherent light in a well-tuned reservoir architecture offers significant performance benefits. The most important design parameters are the delay and the phase shift in the system's physical connections. With optimized values for these parameters, coherent semiconductor optical amplifier (SOA) reservoirs can achieve better results than traditional simulated reservoirs. We also show that process variations hardly degrade the performance, but amplifier noise can be detrimental. This effect must therefore be taken into account when designing SOA-based RC implementations.

  9. Highly Parallel, High-Precision Numerical Integration

    SciTech Connect

    Bailey, David H.; Borwein, Jonathan M.

    2005-04-22

    This paper describes a scheme for rapidly computing numerical values of definite integrals to very high accuracy, ranging from ordinary machine precision to hundreds or thousands of digits, even for functions with singularities or infinite derivatives at endpoints. Such a scheme is of interest not only in computational physics and computational chemistry, but also in experimental mathematics, where high-precision numerical values of definite integrals can be used to numerically discover new identities. This paper discusses techniques for a parallel implementation of this scheme, then presents performance results for 1-D and 2-D test suites. Results are also given for a certain problem from mathematical physics, which features a difficult singularity, confirming a conjecture to 20,000 digit accuracy. The performance rate for this latter calculation on 1024 CPUs is 690 Gflop/s. We believe that this and one other 20,000-digit integral evaluation that we report are the highest-precision non-trivial numerical integrations performed to date.

  10. Parallel Midbrain Microcircuits Perform Independent Temporal Transformations

    PubMed Central

    Huguenard, John; Knudsen, Eric

    2014-01-01

    The capacity to select the most important information and suppress distracting information is crucial for survival. The midbrain contains a network critical for the selection of the strongest stimulus for gaze and attention. In avians, the optic tectum (OT; called the superior colliculus in mammals) and the GABAergic nucleus isthmi pars magnocellularis (Imc) cooperate in the selection process. In the chicken, OT layer 10, located in intermediate layers, responds to afferent input with gamma periodicity (25–75 Hz), measured at the level of individual neurons and the local field potential. In contrast, Imc neurons, which receive excitatory input from layer 10 neurons, respond with tonic, unusually high discharge rates (>150 spikes/s). In this study, we reveal the source of this high-rate inhibitory activity: layer 10 neurons that project to the Imc possess specialized biophysical properties that enable them to transform afferent drive into high firing rates (∼130 spikes/s), whereas neighboring layer 10 neurons, which project elsewhere, transform afferent drive into lower-frequency, periodic discharge patterns. Thus, the intermediate layers of the OT contain parallel, intercalated microcircuits that generate different temporal patterns of activity linked to the functions of their respective downstream targets. PMID:24920618

  11. Parallelized cytoindentation using convex micropatterned surfaces.

    PubMed

    Jia, Bojing; Wee, Tse-Luen; Boudreau, Colton G; Berard, Daniel J; Mallik, Adiel; Juncker, David; Brown, Claire M; Leslie, Sabrina R

    2016-01-01

    Here we present a high-throughput, parallelized cytoindentor for local compression of live cells. The cytoindentor uses convex lens-induced confinement (CLiC) to indent micrometer-sized areas in single cells and/or populations of cells with submicron precision. This is accomplished using micropatterned poly(dimethylsiloxane) (PDMS) films that are adhered to a convex lens to create arrays of extrusions referred to here as "posts." These posts caused local deformation of subcellular regions without any evidence of cell lysis upon CLiC indentation. Our micropost arrays were also functionalized with glycoproteins, such as fibronectin, to both pull and compress cells under customized confinement geometries. Measurements of Chinese hamster ovary (CHO-K1) cell migration trajectories and oxidative stress showed that the CLiC device did not damage or significantly stress the cells. Our novel tool opens a new area of investigation for visualizing mechanobiology and mechanochemistry within living cells, and the high-throughput nature of the technique will streamline investigations as current tools for mechanically probing material properties and molecular dynamics within cells, such as traditional cytoindentors and atomic force microscopy (AFM), are typically restricted to single-cell manipulation. PMID:27528072

  12. Parallel Processing in Face Perception

    ERIC Educational Resources Information Center

    Martens, Ulla; Leuthold, Hartmut; Schweinberger, Stefan R.

    2010-01-01

    The authors examined face perception models with regard to the functional and temporal organization of facial identity and expression analysis. Participants performed a manual 2-choice go/no-go task to classify faces, where response hand depended on facial familiarity (famous vs. unfamiliar) and response execution depended on facial expression…

  13. [Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

    PubMed

    Furuta, Takuya; Sato, Tatsuhiko

    2015-01-01

    Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.

  14. Improved packing of protein side chains with parallel ant colonies

    PubMed Central

    2014-01-01

    Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy

  15. Applications of Parallel Processing in Configuration Analyses

    NASA Technical Reports Server (NTRS)

    Sundaram, Ppchuraman; Hager, James O.; Biedron, Robert T.

    1999-01-01

    The paper presents the recent progress made towards developing an efficient and user-friendly parallel environment for routine analysis of large CFD problems. The coarse-grain parallel version of the CFL3D Euler/Navier-Stokes analysis code, CFL3Dhp, has been ported onto most available parallel platforms. The CFL3Dhp solution accuracy on these parallel platforms has been verified with the CFL3D sequential analyses. User-friendly pre- and post-processing tools that enable a seamless transfer from sequential to parallel processing have been written. Static load balancing tool for CFL3Dhp analysis has also been implemented for achieving good parallel efficiency. For large problems, load balancing efficiency as high as 95% can be achieved even when large number of processors are used. Linear scalability of the CFL3Dhp code with increasing number of processors has also been shown using a large installed transonic nozzle boattail analysis. To highlight the fast turn-around time of parallel processing, the TCA full configuration in sideslip Navier-Stokes drag polar at supersonic cruise has been obtained in a day. CFL3Dhp is currently being used as a production analysis tool.

  16. Portable parallel programming in a Fortran environment

    SciTech Connect

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs.

  17. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  18. On the Scalability of Parallel UCT

    NASA Astrophysics Data System (ADS)

    Segal, Richard B.

    The parallelization of MCTS across multiple-machines has proven surprisingly difficult. The limitations of existing algorithms were evident in the 2009 Computer Olympiad where Zen using a single four-core machine defeated both Fuego with ten eight-core machines, and Mogo with twenty thirty-two core machines. This paper investigates the limits of parallel MCTS in order to understand why distributed parallelism has proven so difficult and to pave the way towards future distributed algorithms with better scaling. We first analyze the single-threaded scaling of Fuego and find that there is an upper bound on the play-quality improvements which can come from additional search. We then analyze the scaling of an idealized N-core shared memory machine to determine the maximum amount of parallelism supported by MCTS. We show that parallel speedup depends critically on how much time is given to each player. We use this relationship to predict parallel scaling for time scales beyond what can be empirically evaluated due to the immense computation required. Our results show that MCTS can scale nearly perfectly to at least 64 threads when combined with virtual loss, but without virtual loss scaling is limited to just eight threads. We also find that for competition time controls scaling to thousands of threads is impossible not necessarily due to MCTS not scaling, but because high levels of parallelism can start to bump up against the upper performance bound of Fuego itself.

  19. Performance of the Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.

  20. Parallel distance matrix computation for Matlab data mining

    NASA Astrophysics Data System (ADS)

    Skurowski, Przemysław; Staniszewski, Michał

    2016-06-01

    The paper presents utility functions for computing of a distance matrix, which plays a crucial role in data mining. The goal in the design was to enable operating on relatively large datasets by overcoming basic shortcoming - computing time - with an interface easy to use. The presented solution is a set of functions, which were created with emphasis on practical applicability in real life. The proposed solution is presented along the theoretical background for the performance scaling. Furthermore, different approaches of the parallel computing are analyzed, including shared memory, which is uncommon in Matlab environment.

  1. Poisson-Boltzmann theory for two parallel uniformly charged plates

    NASA Astrophysics Data System (ADS)

    Xing, Xiangjun

    2011-04-01

    We solve the nonlinear Poisson-Boltzmann equation for two parallel and like-charged plates both inside a symmetric electrolyte, and inside a 2:1 asymmetric electrolyte, in terms of Weierstrass elliptic functions. From these solutions we derive the functional relation between the surface charge density, the plate separation, and the pressure between plates. For the one plate problem, we obtain exact expressions for the electrostatic potential and for the renormalized surface charge density, both in symmetric and in asymmetric electrolytes. For the two plate problems, we obtain new exact asymptotic results in various regimes.

  2. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  3. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  4. Parallel Processing of Broad-Band PPM Signals

    NASA Technical Reports Server (NTRS)

    Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement

    2010-01-01

    A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).

  5. PDS/PIO: Lightweight Libraries for Collective Parallel I/O

    SciTech Connect

    Chen, P.; Christon, M.; Heermann, P.D.; Sturtevant, J.

    1998-10-28

    PDS/PIO is a lightweight, parallel interface designed to support efficient transfers of massive, grid-based, simulation data among memory, disk, and tape subsystems. The higher-level PDS (Parallel Data Set) interface manages data with tensor and unstructured grid abstractions, while the lower-level PIO (Parallel Input/Output) interface accesses data arrays with arbitrary permutation, and provides communication and collective 1/0 operations. Higher-level data abstraction for finite element applications is provided by PXI (Parallel Exodus Interface), which supports, in parallel, functionality of Exodus 11, a finite element data model developed at Sandia National Laboratories. The entire interface is implemented in C with Fortran-callable PDS and PXI wrappers.

  6. PDS/PIO: Lightweight libraries for collective parallel I/O

    SciTech Connect

    STURTEVANT,JUDITH E.; CHEN,PANG-CHIEH; HEERMANN,PHILIP D.; CHRISTON,MARK A.; GREENBERG,DAVID S.

    2000-02-01

    PDS/PIO is a lightweight, parallel interface designed to support efficient transfers of massive, grid-based, simulation data among memory, disk, and tape subsystems. The higher-level PDS (Parallel Data Set) interface manages data with tensor and unstructured grid abstractions, while the lower-level PIO (Parallel Input/Output) interface accesses data arrays with arbitrary permutation, and provides communication and collective I/O operations. Higher-level data abstraction for finite element applications is provided by PXI (Parallel Exodus Interface), which supports, in parallel, functionality of Exodus II, a finite element data model developed at Sandia National Laboratories. The entire interface is implemented in C with Fortran-callable PDS and PXI wrappers.

  7. Research on Reverse Recovery Transient of Parallel Thyristors for Fusion Power Supply

    NASA Astrophysics Data System (ADS)

    Zha, Fengwei; Song, Zhiquan; Fu, Peng; Dong, Lin; Wang, Min

    2014-07-01

    In nuclear fusion power supply systems, the thyristors often need to be connected in parallel for sustaining large current. However, research on the reverse recovery transient of parallel thyristors has not been reported yet. When several thyristors are connected in parallel, they cannot turn-off at the same moment, and thus the turn-off model based on a single thyristor is no longer suitable. In this paper, an analysis is presented for the reverse recovery transient of parallel thyristors. Parallel thyristors can be assumed as one virtual thyristor so that the reverse recovery current can be modeled by an exponential function. Through equivalent transformation of the rectifier circuit, the commutating over-voltage can be calculated based on Kirchhoff's equation. The reverse recovery current and commutation over-voltage waveforms are measured on an experiment platform for a high power rectifier supply. From the measurement results, it is concluded that the modeling method is acceptable.

  8. SLAPP: A systolic linear algebra parallel processor

    SciTech Connect

    Drake, B.L.; Luk, F.T.; Speiser, J.M.; Symanski, J.J.

    1987-07-01

    Systolic array computer architectures provide a means for fast computation of the linear algebra algorithms that form the building blocks of many signal-processing algorithms, facilitating their real-time computation. For applications to signal processing, the systolic array operates on matrices, an inherently parallel view of the data, using numerical linear algebra algorithms that have been suitably parallelized to efficiently utilize the available hardware. This article describes work currently underway at the Naval Ocean Systems Center, San Diego, California, to build a two-dimensional systolic array, SLAPP, demonstrating efficient and modular parallelization of key matric computations for real-time signal- and image-processing problems.

  9. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1997-01-01

    The multiblock reacting Navier-Stokes flow solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  10. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1994-01-01

    The multiblock reacting Navier-Stokes flow-solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  11. Parallel optical memories for very large databases

    NASA Astrophysics Data System (ADS)

    Mitkas, Pericles A.; Berra, P. B.

    1993-02-01

    The steady increase in volume of current and future databases dictates the development of massive secondary storage devices that allow parallel access and exhibit high I/O data rates. Optical memories, such as parallel optical disks and holograms, can satisfy these requirements because they combine high recording density and parallel one- or two-dimensional output. Several configurations for database storage involving different types of optical memory devices are investigated. All these approaches include some level of optical preprocessing in the form of data filtering in an attempt to reduce the amount of data per transaction that reach the electronic front-end.

  12. Time-parallel multiscale/multiphysics framework

    SciTech Connect

    Frantziskonis, G.; Muralidharan, Krishna; Deymier, Pierre; Simunovic, Srdjan; Nukala, Phani K; Pannala, Sreekanth

    2009-01-01

    We introduce the time-parallel compound wavelet matrix method (tpCWM) for modeling the temporal evolution of multiscale and multiphysics systems. The method couples time parallel (TP) and CWM methods operating at different spatial and temporal scales. We demonstrate the efficiency of our approach on two examples: a chemical reaction kinetic system and a non-linear predator prey system. Our results indicate that the tpCWM technique is capable of accelerating time-to-solution by 2 3-orders of magnitude and is amenable to efficient parallel implementation.

  13. Parallel electric fields from ionospheric winds

    NASA Technical Reports Server (NTRS)

    Nakada, M. P.

    1987-01-01

    The possible production of electric fields parallel to the magnetic field by dynamo winds in the E region is examined, using a jet stream wind model. Current return paths through the F region above the stream are examined as well as return paths through the conjugate ionosphere. The Wulf geometry with horizontal winds moving in opposite directions one above the other is also examined. Parallel electric fields are found to depend strongly on the width of current sheets at the edges of the jet stream. If these are narrow enough, appreciable parallel electric fields are produced.

  14. On the parallel solution of parabolic equations

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Youcef

    1989-01-01

    Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.

  15. Distributed parallel messaging for multiprocessor systems

    SciTech Connect

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  16. Structural mechanics computations on parallel computing platforms

    SciTech Connect

    Kulak, R.F.; Plaskacz, E.J.; Pfeiffer, P.A.

    1995-06-01

    With recent advances in parallel supercomputers and network-connected workstations, the solution to large scale structural engineering problems has now become tractable. High-performance computer architectures, which are usually available at large universities and national laboratories, now can solve large nonlinear problems. At the other end of the spectrum, network connected workstations can be configured to become a distributed-parallel computer. This approach is attractive to small, medium and large engineering firms. This paper describes the development of a parallelized finite element computer program for the solution of static, nonlinear structural mechanics problems.

  17. Parallel Communicating Grammar Systems with Regular Control

    NASA Astrophysics Data System (ADS)

    Pardubská, Dana; Plátek, Martin; Otto, Friedrich

    Parallel communicating grammar systems with regular control (RPCGS, for short) are introduced, which are obtained from returning regular parallel communicating grammar systems by restricting the derivations that are executed in parallel by the various components through a regular control language. For the class of languages that are generated by RPCGSs with constant communication complexity we derive a characterization in terms of a restricted type of freely rewriting restarting automaton. From this characterization we obtain that these languages are semi-linear, and that centralized RPCGSs with constant communication complexity are of the same generative power as non-centralized RPCGSs with constant communication complexity.

  18. Parallel Climate Analysis Toolkit (ParCAT)

    SciTech Connect

    Smith, Brian Edward

    2013-06-30

    The parallel analysis toolkit (ParCAT) provides parallel statistical processing of large climate model simulation datasets. ParCAT provides parallel point-wise average calculations, frequency distributions, sum/differences of two datasets, and difference-of-average and average-of-difference for two datasets for arbitrary subsets of simulation time. ParCAT is a command-line utility that can be easily integrated in scripts or embedded in other application. ParCAT supports CMIP5 post-processed datasets as well as non-CMIP5 post-processed datasets. ParCAT reads and writes standard netCDF files.

  19. Knowledge representation into Ada parallel processing

    NASA Technical Reports Server (NTRS)

    Masotto, Tom; Babikyan, Carol; Harper, Richard

    1990-01-01

    The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.

  20. Smart structures technologies for parallel kinematics in handling and assembly

    NASA Astrophysics Data System (ADS)

    Keimer, Ralf; Algermissen, Stephan; Pavlovic, Nenad; Budde, Christoph

    2007-04-01

    Parallel kinematics offer a high potential for increasing performance of machines for handling and assembly. Due to greater stiffness and reduced moving masses compared to typical serial kinematics, higher accelerations and thus lower cycle times can be achieved, which is an essential benchmark for high performance in handling and assembly. However, there are some challenges left to be able to fully exploit the potential of such machines. Some of these challenges are inherent to parallel kinematics, like a low ratio between work and installation space or a considerably changing structural elasticity as a function of the position in work space. Other difficulties arise from high accelerations, which lead to high dynamic loads inducing significant vibrations. While it is essential to cope with the challenges of parallel kinematics in the design-process, smart structures technologies lend themselves as means to face some of these challenges. In this paper a 4-degree of freedom parallel mechanism based on a triglide structure is presented. This machine was designed in a way to overcome the problem of low ratio between work and installation space, by allowing for a change of the structure's configuration with the purpose of increasing the work space. Furthermore, an active vibration suppression was designed and incorporated using rods with embedded piezoceramic actuators. The design of these smart structural parts is discussed and experimental results regarding the vibration suppression are shown. Adaptive joints are another smart structures technology, which can be used to increase the performance of parallel kinematics. The adaptiveness of such joints is reflected in their ability to change their friction attributes, whereas they can be used on one hand to suppress vibrations and on the other hand to change the degrees of freedom (DOF). The vibration suppression is achieved by increasing structural damping at the end of a trajectory and by maintaining low friction

  1. The Construction of Parallel Tests from IRT-Based Item Banks. Project Psychometric Aspects of Item Banking No. 43. Research Report 89-2.

    ERIC Educational Resources Information Center

    Boekkooi-Timminga, Ellen

    The construction of parallel tests from item response theory (IRT) based item banks is discussed. Tests are considered parallel whenever their information functions are identical. After the methods for constructing parallel tests are considered, the computational complexity of 0-1 linear programming and the heuristic procedure applied are…

  2. Data-driven parallel architecture for syntactic pattern recognition

    NASA Astrophysics Data System (ADS)

    Tseng, Chien-Chao; Hwang, Shu-Yuen

    1991-02-01

    Syntax analysis is the primary operation of a Syntactic Pattern Recognition (SPR) system. A real time SPR system would require efficient architectural supports for syntax analysis. The process of syntax analysis and the execution of a logic program are closely related. In this paper we propose a data-driven parallel architecture for syntax analysis based on the principle of parallel execution of logic programs. The proposed architecture is hybrid in the sense that its functional units unlike those itt traditional fine-grain datafiow model are coarse-grain macro operators capable of performing unification operations. The scheme for compiling the datafiow graphs eliminates the necessity of any operand matching unit in the data-driven architecture. All memory requests are tagged with register identification (similar to IBM 360/91) to provide an efficient hardware support for context switching. The experimental results indicate the proposed architecture is promising.

  3. Escape of heated ions upstream of quasi-parallel shocks

    NASA Technical Reports Server (NTRS)

    Edmiston, J. P.; Kennel, C. F.; Eichler, D.

    1982-01-01

    A simple theoretical criterion by which quasi-parallel and quasi-perpendicular collisionless shocks may be distinguished is proposed on the basis of an investigation of the free escape of ions from the post-shock plasma into the region upstream of a fast collisionless shock. It was determined that the accessibility of downstream ions to the upstream region depends on upstream magnetic field shock normal angle, in addition to the upstream plasma parameters, with post-shock ions escaping upstream for shock normal angles of less than 45 deg, in agreement with the observed transition between quasi-parallel and quasi-perpendicular shock structure. Upstream ion distribution functions resembling those of observed intermediate ions and beams are also calculated.

  4. Job Management Requirements for NAS Parallel Systems and Clusters

    NASA Technical Reports Server (NTRS)

    Saphir, William; Tanner, Leigh Ann; Traversat, Bernard

    1995-01-01

    A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.

  5. Hash based parallel algorithms for mining association rules

    SciTech Connect

    Shintani, Takahiko; Kitsuregawa, Masaru

    1996-12-31

    In this paper, we propose four parallel algorithms (NPA, SPA, HPA and RPA-ELD) for mining association rules on shared-nothing parallel machines to improve its performance. In NPA, candidate itemsets are just copied amongst all the processors, which can lead to memory overflow for large transaction databases. The remaining three algorithms partition the candidate itemsets over the processors. If it is partitioned simply (SPA), transaction data has to be broadcast to all processors. HPA partitions the candidate itemsets using a hash function to eliminate broadcasting, which also reduces the comparison workload significantly. HPA-ELD fully utilizes the available memory space by detecting the extremely large itemsets and copying them, which is also very effective at flattering the load over the processors. We implemented these algorithms in a shared-nothing environment. Performance evaluations show that the best algorithm, HPA-ELD, attains good linearity on speedup ratio and is effective for handling skew.

  6. A Concept for Airborne Precision Spacing for Dependent Parallel Approaches

    NASA Technical Reports Server (NTRS)

    Barmore, Bryan E.; Baxley, Brian T.; Abbott, Terence S.; Capron, William R.; Smith, Colin L.; Shay, Richard F.; Hubbs, Clay

    2012-01-01

    The Airborne Precision Spacing concept of operations has been previously developed to support the precise delivery of aircraft landing successively on the same runway. The high-precision and consistent delivery of inter-aircraft spacing allows for increased runway throughput and the use of energy-efficient arrivals routes such as Continuous Descent Arrivals and Optimized Profile Descents. This paper describes an extension to the Airborne Precision Spacing concept to enable dependent parallel approach operations where the spacing aircraft must manage their in-trail spacing from a leading aircraft on approach to the same runway and spacing from an aircraft on approach to a parallel runway. Functionality for supporting automation is discussed as well as procedures for pilots and controllers. An analysis is performed to identify the required information and a new ADS-B report is proposed to support these information needs. Finally, several scenarios are described in detail.

  7. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

    PubMed

    Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

    2016-04-01

    Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper . PMID:27122320

  8. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

    PubMed

    Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

    2016-04-01

    Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .

  9. Feature Clustering for Accelerating Parallel Coordinate Descent

    SciTech Connect

    Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

    2012-12-06

    We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

  10. Improved chopper circuit uses parallel transistors

    NASA Technical Reports Server (NTRS)

    1966-01-01

    Parallel transistor chopper circuit operates with one transistor in the forward mode and the other in the inverse mode. By using this method, it acts as a single, symmetrical, bidirectional transistor, and reduces and stabilizes the offset voltage.

  11. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  12. Parallel programming with PCN. Revision 1

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  13. Parallel algorithms for dynamically partitioning unstructured grids

    SciTech Connect

    Diniz, P.; Plimpton, S.; Hendrickson, B.; Leland, R.

    1994-10-01

    Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describe three algorithms suitable for parallel dynamic load-balancing which attempt to partition unstructured grids so that computational load is balanced and communication is minimized. The execution time of algorithms and the quality of the partitions they generate are compared to results from serial partitioners for two large grids. The integration of the algorithms into a parallel particle simulation is also briefly discussed.

  14. The Nexus task-parallel runtime system

    SciTech Connect

    Foster, I.; Tuecke, S.; Kesselman, C.

    1994-12-31

    A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for task-parallel programming languages. Distinguishing features of Nexus include its support for multiple threads of control, dynamic processor acquisition, dynamic address space creation, a global memory model via interprocessor references, and asynchronous events. In addition, it supports heterogeneity at multiple levels, allowing a single computation to utilize different programming languages, executables, processors, and network protocols. Nexus is currently being used as a compiler target for two task-parallel languages: Fortran M and Compositional C++. In this paper, we present the Nexus design, outline techniques used to implement Nexus on parallel computers, show how it is used in compilers, and compare its performance with that of another runtime system.

  15. Modified mesh-connected parallel computers

    SciTech Connect

    Carlson, D.A. )

    1988-10-01

    The mesh-connected parallel computer is an important parallel processing organization that has been used in the past for the design of supercomputing systems. In this paper, the authors explore modifications of a mesh-connected parallel computer for the purpose of increasing the efficiency of executing important application programs. These modifications are made by adding one or more global mesh structures to the processing array. They show how our modifications allow asymptotic improvements in the efficiency of executing computations having low to medium interprocessor communication requirements (e.g., tree computations, prefix computations, finding the connected components of a graph). For computations with high interprocessor communication requirements such as sorting, they show that they offer no speedup. They also compare the modified mesh-connected parallel computer to other similar organizations including the pyramid, the X-tree, and the mesh-of-trees.

  16. Fast and practical parallel polynomial interpolation

    SciTech Connect

    Egecioglu, O.; Gallopoulos, E.; Koc, C.K.

    1987-01-01

    We present fast and practical parallel algorithms for the computation and evaluation of interpolating polynomials. The algorithms make use of fast parallel prefix techniques for the calculation of divided differences in the Newton representation of the interpolating polynomial. For n + 1 given input pairs the proposed interpolation algorithm requires 2 (log (n + 1)) + 2 parallel arithmetic steps and circuit size O(n/sup 2/). The algorithms are numerically stable and their floating-point implementation results in error accumulation similar to that of the widely used serial algorithms. This is in contrast to other fast serial and parallel interpolation algorithms which are subject to much larger roundoff. We demonstrate that in a distributed memory environment context, a cube connected system is very suitable for the algorithms' implementation, exhibiting very small communication cost. As further advantages we note that our techniques do not require equidistant points, preconditioning, or use of the Fast Fourier Transform. 21 refs., 4 figs.

  17. Massively Parallel Computing: A Sandia Perspective

    SciTech Connect

    Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

    1999-05-06

    The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.

  18. The PISCES 2 parallel programming environment

    NASA Technical Reports Server (NTRS)

    Pratt, Terrence W.

    1987-01-01

    PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.

  19. Parallel processing in finite element structural analysis

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K.

    1987-01-01

    A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

  20. NAS Parallel Benchmarks, Multi-Zone Versions

    NASA Technical Reports Server (NTRS)

    vanderWijngaart, Rob F.; Haopiang, Jin

    2003-01-01

    We describe an extension of the NAS Parallel Benchmarks (NPB) suite that involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy, which is common among structured-mesh production flow solver codes in use at NASA Ames and elsewhere, provides relatively easily exploitable coarse-grain parallelism between meshes. Since the individual application benchmarks also allow fine-grain parallelism themselves, this NPB extension, named NPB Multi-Zone (NPB-MZ), is a good candidate for testing hybrid and multi-level parallelization tools and strategies.