Science.gov

Sample records for not4p functions parallel

  1. Functional MRI using regularized parallel imaging acquisition.

    PubMed

    Lin, Fa-Hsuan; Huang, Teng-Yi; Chen, Nan-Kuei; Wang, Fu-Nien; Stufflebeam, Steven M; Belliveau, John W; Wald, Lawrence L; Kwong, Kenneth K

    2005-08-01

    Parallel MRI techniques reconstruct full-FOV images from undersampled k-space data by using the uncorrelated information from RF array coil elements. One disadvantage of parallel MRI is that the image signal-to-noise ratio (SNR) is degraded because of the reduced data samples and the spatially correlated nature of multiple RF receivers. Regularization has been proposed to mitigate the SNR loss originating due to the latter reason. Since it is necessary to utilize static prior to regularization, the dynamic contrast-to-noise ratio (CNR) in parallel MRI will be affected. In this paper we investigate the CNR of regularized sensitivity encoding (SENSE) acquisitions. We propose to implement regularized parallel MRI acquisitions in functional MRI (fMRI) experiments by incorporating the prior from combined segmented echo-planar imaging (EPI) acquisition into SENSE reconstructions. We investigated the impact of regularization on the CNR by performing parametric simulations at various BOLD contrasts, acceleration rates, and sizes of the active brain areas. As quantified by receiver operating characteristic (ROC) analysis, the simulations suggest that the detection power of SENSE fMRI can be improved by regularized reconstructions, compared to unregularized reconstructions. Human motor and visual fMRI data acquired at different field strengths and array coils also demonstrate that regularized SENSE improves the detection of functionally active brain regions.

  2. Functional MRI Using Regularized Parallel Imaging Acquisition

    PubMed Central

    Lin, Fa-Hsuan; Huang, Teng-Yi; Chen, Nan-Kuei; Wang, Fu-Nien; Stufflebeam, Steven M.; Belliveau, John W.; Wald, Lawrence L.; Kwong, Kenneth K.

    2013-01-01

    Parallel MRI techniques reconstruct full-FOV images from undersampled k-space data by using the uncorrelated information from RF array coil elements. One disadvantage of parallel MRI is that the image signal-to-noise ratio (SNR) is degraded because of the reduced data samples and the spatially correlated nature of multiple RF receivers. Regularization has been proposed to mitigate the SNR loss originating due to the latter reason. Since it is necessary to utilize static prior to regularization, the dynamic contrast-to-noise ratio (CNR) in parallel MRI will be affected. In this paper we investigate the CNR of regularized sensitivity encoding (SENSE) acquisitions. We propose to implement regularized parallel MRI acquisitions in functional MRI (fMRI) experiments by incorporating the prior from combined segmented echo-planar imaging (EPI) acquisition into SENSE reconstructions. We investigated the impact of regularization on the CNR by performing parametric simulations at various BOLD contrasts, acceleration rates, and sizes of the active brain areas. As quantified by receiver operating characteristic (ROC) analysis, the simulations suggest that the detection power of SENSE fMRI can be improved by regularized reconstructions, compared to unregularized reconstructions. Human motor and visual fMRI data acquired at different field strengths and array coils also demonstrate that regularized SENSE improves the detection of functionally active brain regions. PMID:16032694

  3. Programming in Manticore, a Heterogenous Parallel Functional Language

    NASA Astrophysics Data System (ADS)

    Fluet, Matthew; Bergstrom, Lars; Ford, Nic; Rainey, Mike; Reppy, John; Shaw, Adam; Xiao, Yingqi

    The Manticore project is an effort to design and implement a new functional language for parallel programming. Unlike many earlier parallel languages, Manticore is a heterogeneous language that supports parallelism at multiple levels. Specifically, the Manticore language combines Concurrent ML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs. These lectures will introduce the Manticore language and explore a variety of programs written to take advantage of heterogeneous parallelism.

  4. Massively parallel density functional calculations for thousands of atoms: KKRnano

    NASA Astrophysics Data System (ADS)

    Thiess, A.; Zeller, R.; Bolten, M.; Dederichs, P. H.; Blügel, S.

    2012-06-01

    Applications of existing precise electronic-structure methods based on density functional theory are typically limited to the treatment of about 1000 inequivalent atoms, which leaves unresolved many open questions in material science, e.g., on complex defects, interfaces, dislocations, and nanostructures. KKRnano is a new massively parallel linear scaling all-electron density functional algorithm in the framework of the Korringa-Kohn-Rostoker (KKR) Green's-function method. We conceptualized, developed, and optimized KKRnano for large-scale applications of many thousands of atoms without compromising on the precision of a full-potential all-electron method, i.e., it is a method without any shape approximation of the charge density or potential. A key element of the new method is the iterative solution of the sparse linear Dyson equation, which we parallelized atom by atom, across energy points in the complex plane and for each spin degree of freedom using the message passing interface standard, followed by a lower-level OpenMP parallelization. This hybrid four-level parallelization allows for an efficient use of up to 100000 processors on the latest generation of supercomputers. The iterative solution of the Dyson equation is significantly accelerated, employing preconditioning techniques making use of coarse-graining principles expressed in a block-circulant preconditioner. In this paper, we will describe the important elements of this new algorithm, focusing on the parallelization and preconditioning and showing scaling results for NiPd alloys up to 8192 atoms and 65536 processors. At the end, we present an order-N algorithm for large-scale simulations of metallic systems, making use of the nearsighted principle of the KKR Green's-function approach by introducing a truncation of the electron scattering to a local cluster of atoms, the size of which is determined by the requested accuracy. By exploiting this algorithm, we show linear scaling calculations of more

  5. Highly parallel oligonucleotide purification and functionalization using reversible chemistry.

    PubMed

    York, Kerri T; Smith, Ryan C; Yang, Rob; Melnyk, Peter C; Wiley, Melissa M; Turk, Casey M; Ronaghi, Mostafa; Gunderson, Kevin L; Steemers, Frank J

    2012-01-01

    We have developed a cost-effective, highly parallel method for purification and functionalization of 5'-labeled oligonucleotides. The approach is based on 5'-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5'-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90-99% purity. Approaches for using this method in other applications are also discussed.

  6. Highly parallel oligonucleotide purification and functionalization using reversible chemistry

    PubMed Central

    York, Kerri T.; Smith, Ryan C.; Yang, Rob; Melnyk, Peter C.; Wiley, Melissa M.; Turk, Casey M.; Ronaghi, Mostafa; Gunderson, Kevin L.; Steemers, Frank J.

    2012-01-01

    We have developed a cost-effective, highly parallel method for purification and functionalization of 5′-labeled oligonucleotides. The approach is based on 5′-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5′-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90–99% purity. Approaches for using this method in other applications are also discussed. PMID:22039155

  7. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  8. Administering truncated receive functions in a parallel messaging interface

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.

  9. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    SciTech Connect

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  10. Functional specialization of parallel motion detection circuits in the fly.

    PubMed

    Joesch, Maximilian; Weber, Franz; Eichner, Hubert; Borst, Alexander

    2013-01-16

    In the fly Drosophila melanogaster, photoreceptor input to motion vision is split into two parallel pathways as represented by first-order interneurons L1 and L2 (Rister et al., 2007; Joesch et al., 2010). However, how these pathways are functionally specialized remains controversial. One study (Eichner et al., 2011) proposed that the L1-pathway evaluates only sequences of brightness increments (ON-ON), while the L2-pathway processes exclusively brightness decrements (OFF-OFF). Another study (Clark et al., 2011) proposed that each of the two pathways evaluates both ON-ON and OFF-OFF sequences. To decide between these alternatives, we recorded from motion-sensitive neurons in flies in which the output from either L1 or L2 was genetically blocked. We found that blocking L1 abolishes ON-ON responses but leaves OFF-OFF responses intact. The opposite was true, when the output from L2 was blocked. We conclude that the L1 and L2 pathways are functionally specialized to detect ON-ON and OFF-OFF sequences, respectively.

  11. Parallel functional programming in Sisal: Fictions, facts, and future

    SciTech Connect

    McGraw, J.R.

    1993-07-01

    This paper provides a status report on the progress of research and development on the functional language Sisal. This project focuses on providing a highly effective method of writing large scientific applications that can efficiently execute on a spectrum of different multiprocessors. The paper includes sections on the language definition, compilation strategies, and programming techniques intended for readers with little or no background with Sisal. The section on performance presents our most recent results on execution speed for shared-memory multiprocessors, our findings using Sisal to develop codes, and our experiences migrating the same source code to different machines. For large programs, the execution performance of Sisal (with minimal supporting advice from the programmer) usually exceeds that of the best available automatic, vector/parallel Fortran compilers. Our evidence also indicates that Sisal programs tend to be shorter in length, faster to write, and dearer to understand than equivalent algorithms in Fortran. The paper concludes with a substantial discussion of common criticisms of the language and our plans for addressing them. Most notably, efficient implementations for distributed memory machines are lacking; an issue we plan to remedy.

  12. Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

    SciTech Connect

    Zhang, Hong; Zapol, Peter; Dixon, David A.; Wagner, Albert F.; Keceli, Murat

    2015-11-17

    The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPs is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.

  13. Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

    DOE PAGES

    Zhang, Hong; Zapol, Peter; Dixon, David A.; ...

    2015-11-17

    The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less

  14. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  15. Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.

    PubMed

    Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E

    2017-02-14

    Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.

  16. Efficient time-dependent density functional theory approximations for hybrid density functionals: Analytical gradients and parallelization

    NASA Astrophysics Data System (ADS)

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-01

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ˜26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ˜27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ˜24 on 30 processors. The

  17. Efficient time-dependent density functional theory approximations for hybrid density functionals: analytical gradients and parallelization.

    PubMed

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-07

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ~26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ~27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ~24 on 30 processors. The

  18. Parallel computers

    SciTech Connect

    Treveaven, P.

    1989-01-01

    This book presents an introduction to object-oriented, functional, and logic parallel computing on which the fifth generation of computer systems will be based. Coverage includes concepts for parallel computing languages, a parallel object-oriented system (DOOM) and its language (POOL), an object-oriented multilevel VLSI simulator using POOL, and implementation of lazy functional languages on parallel architectures.

  19. PDoublePop: An implementation of parallel genetic algorithm for function optimization

    NASA Astrophysics Data System (ADS)

    Tsoulos, Ioannis G.; Tzallas, Alexandros; Tsalikakis, Dimitris

    2016-12-01

    A software for the implementation of parallel genetic algorithms is presented in this article. The underlying genetic algorithm is aimed to locate the global minimum of a multidimensional function inside a rectangular hyperbox. The proposed software named PDoublePop implements a client-server model for parallel genetic algorithms with advanced features for the local genetic algorithms such as: an enhanced stopping rule, an advanced mutation scheme and periodical application of a local search procedure. The user may code the objective function either in C++ or in Fortran77. The method is tested on a series of well-known test functions and the results are reported.

  20. Parallelization of the polarizable embedding scheme for higher-order response functions

    NASA Astrophysics Data System (ADS)

    Hykkerud Steindal, Arnfinn; Magnus Haugaard Olsen, Jógvan; Frediani, Luca; Kongsted, Jacob; Ruud, Kenneth

    2012-10-01

    We present a parallel implementation of the Polarizable Embedding (PE) method, an advanced quantum mechanics/molecular mechanics (QM/MM) approach, for Hartree-Fock (PE-HF) and density functional theory (PE-DFT). The parallelization includes calculations of energies and linear, quadratic, and cubic response functions. The couplings to the QM system due to the polarizable embedding potential have been implemented using a master/slave approach. The implementation shows good scaling behaviour, demonstrated through calculations on a small (a water molecule in a bulk of water molecules) and a larger system (Green Fluorescent Protein (GFP)).

  1. Method, systems, and computer program products for implementing function-parallel network firewall

    DOEpatents

    Fulp, Errin W [Winston-Salem, NC; Farley, Ryan J [Winston-Salem, NC

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  2. Parallel sites implicate functional convergence of the hearing gene prestin among echolocating mammals.

    PubMed

    Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng

    2014-09-01

    Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  3. Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities

    PubMed Central

    Jing, Gongchao; Sun, Zheng; Wang, Honglei; Gong, Yanhai; Huang, Shi; Ning, Kang; Xu, Jian; Su, Xiaoquan

    2017-01-01

    The number of metagenomes is increasing rapidly. However, current methods for metagenomic analysis are limited by their capability for in-depth data mining among a large number of microbiome each of which carries a complex community structure. Moreover, the complexity of configuring and operating computational pipeline also hinders efficient data processing for the end users. In this work we introduce Parallel-META 3, a comprehensive and fully automatic computational toolkit for rapid data mining among metagenomic datasets, with advanced features including 16S rRNA extraction for shotgun sequences, 16S rRNA copy number calibration, 16S rRNA based functional prediction, diversity statistics, bio-marker selection, interaction network construction, vector-graph-based visualization and parallel computing. Application of Parallel-META 3 on 5,337 samples with 1,117,555,208 sequences from diverse studies and platforms showed it could produce similar results as QIIME and PICRUSt with much faster speed and lower memory usage, which demonstrates its ability to unravel the taxonomical and functional dynamics patterns across large datasets and elucidate ecological links between microbiome and the environment. Parallel-META 3 is implemented in C/C++ and R, and integrated into an executive package for rapid installation and easy access under Linux and Mac OS X. Both binary and source code packages are available at http://bioinfo.single-cell.cn/parallel-meta.html. PMID:28079128

  4. Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities.

    PubMed

    Jing, Gongchao; Sun, Zheng; Wang, Honglei; Gong, Yanhai; Huang, Shi; Ning, Kang; Xu, Jian; Su, Xiaoquan

    2017-01-12

    The number of metagenomes is increasing rapidly. However, current methods for metagenomic analysis are limited by their capability for in-depth data mining among a large number of microbiome each of which carries a complex community structure. Moreover, the complexity of configuring and operating computational pipeline also hinders efficient data processing for the end users. In this work we introduce Parallel-META 3, a comprehensive and fully automatic computational toolkit for rapid data mining among metagenomic datasets, with advanced features including 16S rRNA extraction for shotgun sequences, 16S rRNA copy number calibration, 16S rRNA based functional prediction, diversity statistics, bio-marker selection, interaction network construction, vector-graph-based visualization and parallel computing. Application of Parallel-META 3 on 5,337 samples with 1,117,555,208 sequences from diverse studies and platforms showed it could produce similar results as QIIME and PICRUSt with much faster speed and lower memory usage, which demonstrates its ability to unravel the taxonomical and functional dynamics patterns across large datasets and elucidate ecological links between microbiome and the environment. Parallel-META 3 is implemented in C/C++ and R, and integrated into an executive package for rapid installation and easy access under Linux and Mac OS X. Both binary and source code packages are available at http://bioinfo.single-cell.cn/parallel-meta.html.

  5. Extending the functionalities of Cartesian grid solvers: Viscous effects modeling and MPI parallelization

    NASA Astrophysics Data System (ADS)

    Marshall, David D.

    With the renewed interest in Cartesian gridding methodologies for the ease and speed of gridding complex geometries in addition to the simplicity of the control volumes used in the computations, it has become important to investigate ways of extending the existing Cartesian grid solver functionalities. This includes developing methods of modeling the viscous effects in order to utilize Cartesian grids solvers for accurate drag predictions and addressing the issues related to the distributed memory parallelization of Cartesian solvers. This research presents advances in two areas of interest in Cartesian grid solvers, viscous effects modeling and MPI parallelization. The development of viscous effects modeling using solely Cartesian grids has been hampered by the widely varying control volume sizes associated with the mesh refinement and the cut cells associated with the solid surface. This problem is being addressed by using physically based modeling techniques to update the state vectors of the cut cells and removing them from the finite volume integration scheme. This work is performed on a new Cartesian grid solver, NASCART-GT, with modifications to its cut cell functionality. The development of MPI parallelization addresses issues associated with utilizing Cartesian solvers on distributed memory parallel environments. This work is performed on an existing Cartesian grid solver, CART3D, with modifications to its parallelization methodology.

  6. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    Charon is a software toolkit that enables engineers to develop high-performing message-passing programs in a convenient and piecemeal fashion. Emphasis is on rapid program development and prototyping. In this report a detailed description of the functional design of the toolkit is presented. It is illustrated by the stepwise parallelization of two representative code examples.

  7. Analysis and selection of optimal function implementations in massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  8. Massively parallel GPU-accelerated minimization of classical density functional theory

    NASA Astrophysics Data System (ADS)

    Stopper, Daniel; Roth, Roland

    2017-08-01

    In this paper, we discuss the ability to numerically minimize the grand potential of hard disks in two-dimensional and of hard spheres in three-dimensional space within the framework of classical density functional and fundamental measure theory on modern graphics cards. Our main finding is that a massively parallel minimization leads to an enormous performance gain in comparison to standard sequential minimization schemes. Furthermore, the results indicate that in complex multi-dimensional situations, a heavy parallel minimization of the grand potential seems to be mandatory in order to reach a reasonable balance between accuracy and computational cost.

  9. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  10. DGDFT: A massively parallel method for large scale density functional theory calculations.

    PubMed

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  11. Many-to-one form-to-function mapping weakens parallel morphological evolution.

    PubMed

    Thompson, Cole J; Ahmed, Newaz I; Veen, Thor; Peichel, Catherine L; Hendry, Andrew P; Bolnick, Daniel I; Stuart, Yoel E

    2017-09-07

    Evolutionary ecologists aim to explain and predict evolutionary change under different selective regimes. Theory suggests that such evolutionary prediction should be more difficult for biomechanical systems in which different trait combinations generate the same functional output: "many-to-one mapping". Many-to-one mapping of phenotype to function enables multiple morphological solutions to meet the same adaptive challenges. Therefore, many-to-one mapping should undermine parallel morphological evolution, and hence evolutionary predictability, even when selection pressures are shared among populations. Studying 16 replicate pairs of lake- and stream-adapted threespine stickleback (Gasterosteus aculeatus), we quantified three parts of the teleost feeding apparatus and used biomechanical models to calculate their expected functional outputs. The three feeding structures differed in their form-to-function relationship from one-to-one (lower jaw lever ratio) to increasingly many-to-one (buccal suction index, opercular 4-bar linkage). We tested for (1) weaker linear correlations between phenotype and calculated function, and (2) less parallel evolution across lake-stream pairs, in the many-to-one systems relative to the one-to-one system. We confirm both predictions, thus supporting the theoretical expectation that increasing many-to-one mapping undermines parallel evolution. Therefore, sole consideration of morphological variation within and among populations might not serve as a proxy for functional variation when multiple adaptive trait combinations exist. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  12. Storing files in a parallel computing system based on user-specified parser function

    SciTech Connect

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  13. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    PubMed

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  14. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    NASA Astrophysics Data System (ADS)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  15. Micro/Nanoscale Parallel Patterning of Functional Biomolecules, Organic Fluorophores and Colloidal Nanocrystals

    NASA Astrophysics Data System (ADS)

    Sabella, S.; Brunetti, V.; Vecchio, G.; Torre, A. Della; Rinaldi, R.; Cingolani, R.; Pompa, P. P.

    2009-10-01

    We describe the design and optimization of a reliable strategy that combines self-assembly and lithographic techniques, leading to very precise micro-/nanopositioning of biomolecules for the realization of micro- and nanoarrays of functional DNA and antibodies. Moreover, based on the covalent immobilization of stable and versatile SAMs of programmable chemical reactivity, this approach constitutes a general platform for the parallel site-specific deposition of a wide range of molecules such as organic fluorophores and water-soluble colloidal nanocrystals.

  16. Implementation of linear-scaling plane wave density functional theory on parallel computers

    NASA Astrophysics Data System (ADS)

    Skylaris, Chris-Kriton; Haynes, Peter D.; Mostofi, Arash A.; Payne, Mike C.

    We describe the algorithms we have developed for linear-scaling plane wave density functional calculations on parallel computers as implemented in the onetep program. We outline how onetep achieves plane wave accuracy with a computational cost which increases only linearly with the number of atoms by optimising directly the single-particle density matrix expressed in a psinc basis set. We describe in detail the novel algorithms we have developed for computing with the psinc basis set the quantities needed in the evaluation and optimisation of the total energy within our approach. For our parallel computations we use the general Message Passing Interface (MPI) library of subroutines to exchange data between processors. Accordingly, we have developed efficient schemes for distributing data and computational load to processors in a balanced manner. We describe these schemes in detail and in relation to our algorithms for computations with a psinc basis. Results of tests on different materials show that onetep is an efficient parallel code that should be able to take advantage of a wide range of parallel computer architectures.

  17. GPU-based parallel group ICA for functional magnetic resonance data.

    PubMed

    Jing, Yanshan; Zeng, Weiming; Wang, Nizhuan; Ren, Tianlong; Shi, Yingchao; Yin, Jun; Xu, Qi

    2015-04-01

    The goal of our study is to develop a fast parallel implementation of group independent component analysis (ICA) for functional magnetic resonance imaging (fMRI) data using graphics processing units (GPU). Though ICA has become a standard method to identify brain functional connectivity of the fMRI data, it is computationally intensive, especially has a huge cost for the group data analysis. GPU with higher parallel computation power and lower cost are used for general purpose computing, which could contribute to fMRI data analysis significantly. In this study, a parallel group ICA (PGICA) on GPU, mainly consisting of GPU-based PCA using SVD and Infomax-ICA, is presented. In comparison to the serial group ICA, the proposed method demonstrated both significant speedup with 6-11 times and comparable accuracy of functional networks in our experiments. This proposed method is expected to perform the real-time post-processing for fMRI data analysis. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  18. Parallel Alterations of Functional Connectivity during Execution and Imagination after Motor Imagery Learning

    PubMed Central

    Zhang, Rushao; Hui, Mingqi; Long, Zhiying; Zhao, Xiaojie; Yao, Li

    2012-01-01

    Background Neural substrates underlying motor learning have been widely investigated with neuroimaging technologies. Investigations have illustrated the critical regions of motor learning and further revealed parallel alterations of functional activation during imagination and execution after learning. However, little is known about the functional connectivity associated with motor learning, especially motor imagery learning, although benefits from functional connectivity analysis attract more attention to the related explorations. We explored whether motor imagery (MI) and motor execution (ME) shared parallel alterations of functional connectivity after MI learning. Methodology/Principal Findings Graph theory analysis, which is widely used in functional connectivity exploration, was performed on the functional magnetic resonance imaging (fMRI) data of MI and ME tasks before and after 14 days of consecutive MI learning. The control group had no learning. Two measures, connectivity degree and interregional connectivity, were calculated and further assessed at a statistical level. Two interesting results were obtained: (1) The connectivity degree of the right posterior parietal lobe decreased in both MI and ME tasks after MI learning in the experimental group; (2) The parallel alterations of interregional connectivity related to the right posterior parietal lobe occurred in the supplementary motor area for both tasks. Conclusions/Significance These computational results may provide the following insights: (1) The establishment of motor schema through MI learning may induce the significant decrease of connectivity degree in the posterior parietal lobe; (2) The decreased interregional connectivity between the supplementary motor area and the right posterior parietal lobe in post-test implicates the dissociation between motor learning and task performing. These findings and explanations further revealed the neural substrates underpinning MI learning and supported that

  19. Comparison of parallel acquisition techniques generalized autocalibrating partially parallel acquisitions (GRAPPA) and modified sensitivity encoding (mSENSE) in functional MRI (fMRI) at 3T.

    PubMed

    Preibisch, Christine; Wallenhorst, Tim; Heidemann, Robin; Zanella, Friedhelm E; Lanfermann, Heinrich

    2008-03-01

    To evaluate the parallel acquisition techniques, generalized autocalibrating partially parallel acquisitions (GRAPPA) and modified sensitivity encoding (mSENSE), and determine imaging parameters maximizing sensitivity toward functional activation at 3T. A total of eight imaging protocols with different parallel imaging techniques (GRAPPA and mSENSE) and reduction factors (R = 1, 2, 3) were compared at different matrix sizes (64 and 128) with respect to temporal noise characteristics, artifact behavior, and sensitivity toward functional activation. Echo planar imaging (EPI) with GRAPPA and a reduction factor of 2 revealed similar image quality and sensitivity than full k-space EPI. A higher incidence of artifacts and a marked sensitivity loss occurred at R = 3. Even though the same eight-channel head coil was used for signal detection in all experiments, GRAPPA generally showed more benign patterns of spatially-varying noise amplification, and mSENSE was also more susceptible to residual unfolding artifacts than GRAPPA. At 3T and a reduction factor of 2, parallel imaging can be used with only little penalty with regard to sensitivity. With our implementation and coil setup the performance of GRAPPA was clearly superior to mSENSE. Thus, it seems advisable to pay special attention to the employed parallel imaging method and its implementation.

  20. Locating and computing in parallel all the simple roots of special functions using PVM

    NASA Astrophysics Data System (ADS)

    Plagianakos, V. P.; Nousis, N. K.; Vrahatis, M. N.

    2001-08-01

    An algorithm is proposed for locating and computing in parallel and with certainty all the simple roots of any twice continuously differentiable function in any specific interval. To compute with certainty all the roots, the proposed method is heavily based on the knowledge of the total number of roots within the given interval. To obtain this information we use results from topological degree theory and, in particular, the Kronecker-Picard approach. This theory gives a formula for the computation of the total number of roots of a system of equations within a given region, which can be computed in parallel. With this tool in hand, we construct a parallel procedure for the localization and isolation of all the roots by dividing the given region successively and applying the above formula to these subregions until the final domains contain at the most one root. The subregions with no roots are discarded, while for the rest a modification of the well-known bisection method is employed for the computation of the contained root. The new aspect of the present contribution is that the computation of the total number of zeros using the Kronecker-Picard integral as well as the localization and computation of all the roots is performed in parallel using the parallel virtual machine (PVM). PVM is an integrated set of software tools and libraries that emulates a general-purpose, flexible, heterogeneous concurrent computing framework on interconnected computers of varied architectures. The proposed algorithm has large granularity and low synchronization, and is robust. It has been implemented and tested and our experience is that it can massively compute with certainty all the roots in a certain interval. Performance information from massive computations related to a recently proposed conjecture due to Elbert (this issue, J. Comput. Appl. Math. 133 (2001) 65-83) is reported.

  1. Parallel fixed point implementation of a radial basis function network in an FPGA.

    PubMed

    de Souza, Alisson C D; Fernandes, Marcelo A C

    2014-09-29

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.

  2. Parallel Fixed Point Implementation of a Radial Basis Function Network in an FPGA

    PubMed Central

    de Souza, Alisson C. D.; Fernandes, Marcelo A. C.

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. PMID:25268918

  3. A Parallel Independent Component Analysis Approach to Investigate Genomic Influence on Brain Function

    PubMed Central

    Liu, Jingyu; Demirci, Oguz; Calhoun, Vince D.

    2009-01-01

    Relationships between genomic data and functional brain images are of great interest but require new analysis approaches to integrate the high-dimensional data types. This letter presents an extension of a technique called parallel independent component analysis (paraICA), which enables the joint analysis of multiple modalities including interconnections between them. We extend our earlier work by allowing for multiple interconnections and by providing important overfitting controls. Performance was assessed by simulations under different conditions, and indicated reliable results can be extracted by properly balancing overfitting and underfitting. An application to functional magnetic resonance images and single nucleotide polymorphism array produced interesting findings. PMID:19834575

  4. A cost-effective methodology for the design of massively-parallel VLSI functional units

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  5. Parallel functional category deficits in clauses and nominal phrases: The case of English agrammatism

    PubMed Central

    Wang, Honglei; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Individuals with agrammatic aphasia exhibit restricted patterns of impairment of functional morphemes, however, syntactic characterization of the impairment is controversial. Previous studies have focused on functional morphology in clauses only. This study extends the empirical domain by testing functional morphemes in English nominal phrases in aphasia and comparing patients’ impairment to their impairment of functional morphemes in English clauses. In the linguistics literature, it is assumed that clauses and nominal phrases are structurally parallel but exhibit inflectional differences. The results of the present study indicated that aphasic speakers evinced similar impairment patterns in clauses and nominal phrases. These findings are consistent with the Distributed Morphology Hypothesis (DMH), suggesting that the source of functional morphology deficits among agrammatics relates to difficulty implementing rules that convert inflectional features into morphemes. Our findings, however, are inconsistent with the Tree Pruning Hypothesis (TPH), which suggests that patients have difficulty building complex hierarchical structures. PMID:26379370

  6. Parallel functional category deficits in clauses and nominal phrases: The case of English agrammatism.

    PubMed

    Wang, Honglei; Yoshida, Masaya; Thompson, Cynthia K

    2014-01-01

    Individuals with agrammatic aphasia exhibit restricted patterns of impairment of functional morphemes, however, syntactic characterization of the impairment is controversial. Previous studies have focused on functional morphology in clauses only. This study extends the empirical domain by testing functional morphemes in English nominal phrases in aphasia and comparing patients' impairment to their impairment of functional morphemes in English clauses. In the linguistics literature, it is assumed that clauses and nominal phrases are structurally parallel but exhibit inflectional differences. The results of the present study indicated that aphasic speakers evinced similar impairment patterns in clauses and nominal phrases. These findings are consistent with the Distributed Morphology Hypothesis (DMH), suggesting that the source of functional morphology deficits among agrammatics relates to difficulty implementing rules that convert inflectional features into morphemes. Our findings, however, are inconsistent with the Tree Pruning Hypothesis (TPH), which suggests that patients have difficulty building complex hierarchical structures.

  7. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  8. Superresolution parallel magnetic resonance imaging: Application to functional and spectroscopic imaging

    PubMed Central

    Otazo, Ricardo; Lin, Fa-Hsuan; Wiggins, Graham; Jordan, Ramiro; Sodickson, Daniel; Posse, Stefan

    2009-01-01

    Standard parallel magnetic resonance imaging (MRI) techniques suffer from residual aliasing artifacts when the coil sensitivities vary within the image voxel. In this work, a parallel MRI approach known as Superresolution SENSE (SURE-SENSE) is presented in which acceleration is performed by acquiring only the central region of k-space instead of increasing the sampling distance over the complete k-space matrix and reconstruction is explicitly based on intra-voxel coil sensitivity variation. In SURE-SENSE, parallel MRI reconstruction is formulated as a superresolution imaging problem where a collection of low resolution images acquired with multiple receiver coils are combined into a single image with higher spatial resolution using coil sensitivities acquired with high spatial resolution. The effective acceleration of conventional gradient encoding is given by the gain in spatial resolution, which is dictated by the degree of variation of the different coil sensitivity profiles within the low resolution image voxel. Since SURE-SENSE is an ill-posed inverse problem, Tikhonov regularization is employed to control noise amplification. Unlike standard SENSE, for which acceleration is constrained to the phase-encoding dimension/s, SURE-SENSE allows acceleration along all encoding directions — for example, two-dimensional acceleration of a 2D echo-planar acquisition. SURE-SENSE is particularly suitable for low spatial resolution imaging modalities such as spectroscopic imaging and functional imaging with high temporal resolution. Application to echo-planar functional and spectroscopic imaging in human brain is presented using two-dimensional acceleration with a 32-channel receiver coil. PMID:19341804

  9. Parallel Execution of Functional Mock-up Units in Buildings Modeling

    SciTech Connect

    Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan

    2016-06-30

    A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported as a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.

  10. Finding zeros of nonlinear functions using the hybrid parallel cell mapping method

    NASA Astrophysics Data System (ADS)

    Xiong, Fu-Rui; Schütze, Oliver; Ding, Qian; Sun, Jian-Qiao

    2016-05-01

    Analysis of nonlinear dynamical systems including finding equilibrium states and stability boundaries often leads to a problem of finding zeros of vector functions. However, finding all the zeros of a set of vector functions in the domain of interest is quite a challenging task. This paper proposes a zero finding algorithm that combines the cell mapping methods and the subdivision techniques. Both the simple cell mapping (SCM) and generalized cell mapping (GCM) methods are used to identify a covering set of zeros. The subdivision technique is applied to enhance the solution resolution. The parallel implementation of the proposed method is discussed extensively. Several examples are presented to demonstrate the application and effectiveness of the proposed method. We then extend the study of finding zeros to the problem of finding stability boundaries of potential fields. Examples of two and three dimensional potential fields are studied. In addition to the effectiveness in finding the stability boundaries, the proposed method can handle several millions of cells in just a few seconds with the help of parallel computing in graphics processing units (GPUs).

  11. The Gut Metagenome Changes in Parallel to Waist Circumference, Brain Iron Deposition, and Cognitive Function.

    PubMed

    Blasco, Gerard; Moreno-Navarrete, José Maria; Rivero, Mireia; Pérez-Brocal, Vicente; Garre-Olmo, Josep; Puig, Josep; Daunis-I-Estadella, Pepus; Biarnés, Carles; Gich, Jordi; Fernández-Aranda, Fernando; Alberich-Bayarri, Ángel; Moya, Andrés; Pedraza, Salvador; Ricart, Wifredo; López, Miguel; Portero-Otin, Manuel; Fernandez-Real, José-Manuel

    2017-08-01

    Microbiota perturbations seem to exert modulatory effects on emotional behavior, stress-, and pain-modulation systems in adult animals; however, limited information is available in humans. To study potential relationships among the gut metagenome, brain microstructure, and cognitive performance in middle-aged, apparently healthy, obese and nonobese subjects after weight changes. This is a longitudinal study over a 2-year period. A tertiary public hospital. Thirty-five (18 obese) apparently healthy subjects. Diet counseling was provided to all subjects. Obese subjects were followed every 6 months. Brain relaxometry (using magnetic resonance R2*), cognitive performance (by means of cognitive tests), and gut microbiome composition (shotgun). R2* increased in both obese and nonobese subjects, independent of weight variations. Changes in waist circumference, but not in body mass index, were associated with brain iron deposition (R2*) in the striatum, amygdala, and hippocampus in parallel to visual-spatial constructional ability and circulating beta amyloid Aβ42 levels. These changes were linked to shifts in gut microbiome in which the relative abundance of bacteria belonging to Caldiserica and Thermodesulfobacteria phyla were reciprocally associated with raised R2* in different brain nuclei. Of note, the increase in bacteria belonging to Tenericutes phylum was parallel to decreased R2* gain in the striatum, serum Aβ42 levels, and spared visual-spatial constructional ability. Interestingly, metagenome functions associated with circulating and brain iron stores are involved in bacterial generation of siderophores. Changes in the gut metagenome are associated longitudinally with cognitive function and brain iron deposition.

  12. Introducing ONETEP: linear-scaling density functional simulations on parallel computers.

    PubMed

    Skylaris, Chris-Kriton; Haynes, Peter D; Mostofi, Arash A; Payne, Mike C

    2005-02-22

    We present ONETEP (order-N electronic total energy package), a density functional program for parallel computers whose computational cost scales linearly with the number of atoms and the number of processors. ONETEP is based on our reformulation of the plane wave pseudopotential method which exploits the electronic localization that is inherent in systems with a nonvanishing band gap. We summarize the theoretical developments that enable the direct optimization of strictly localized quantities expressed in terms of a delocalized plane wave basis. These same localized quantities lead us to a physical way of dividing the computational effort among many processors to allow calculations to be performed efficiently on parallel supercomputers. We show with examples that ONETEP achieves excellent speedups with increasing numbers of processors and confirm that the time taken by ONETEP as a function of increasing number of atoms for a given number of processors is indeed linear. What distinguishes our approach is that the localization is achieved in a controlled and mathematically consistent manner so that ONETEP obtains the same accuracy as conventional cubic-scaling plane wave approaches and offers fast and stable convergence. We expect that calculations with ONETEP have the potential to provide quantitative theoretical predictions for problems involving thousands of atoms such as those often encountered in nanoscience and biophysics.

  13. Line-field parallel swept source MHz OCT for structural and functional retinal imaging

    PubMed Central

    Fechtig, Daniel J.; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M.; Drexler, Wolfgang; Leitgeb, Rainer A.

    2015-01-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation. PMID:25798298

  14. Line-field parallel swept source MHz OCT for structural and functional retinal imaging.

    PubMed

    Fechtig, Daniel J; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M; Drexler, Wolfgang; Leitgeb, Rainer A

    2015-03-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation.

  15. Parallelization of the integral equation formulation of the polarizable continuum model for higher-order response functions

    NASA Astrophysics Data System (ADS)

    Ferrighi, Lara; Frediani, Luca; Fossgaard, Eirik; Ruud, Kenneth

    2006-10-01

    We present a parallel implementation of the integral equation formalism of the polarizable continuum model for Hartree-Fock and density functional theory calculations of energies and linear, quadratic, and cubic response functions. The contributions to the free energy of the solute due to the polarizable continuum have been implemented using a master-slave approach with load balancing to ensure good scalability also on parallel machines with a slow interconnect. We demonstrate the good scaling behavior of the code through calculations of Hartree-Fock energies and linear, quadratic, and cubic response function for a modest-sized sample molecule. We also explore the behavior of the parallelization of the integral equation formulation of the polarizable continuum model code when used in conjunction with a recent scheme for the storage of two-electron integrals in the memory of the different slaves in order to achieve superlinear scaling in the parallel calculations.

  16. Parallelization of the integral equation formulation of the polarizable continuum model for higher-order response functions.

    PubMed

    Ferrighi, Lara; Frediani, Luca; Fossgaard, Eirik; Ruud, Kenneth

    2006-10-21

    We present a parallel implementation of the integral equation formalism of the polarizable continuum model for Hartree-Fock and density functional theory calculations of energies and linear, quadratic, and cubic response functions. The contributions to the free energy of the solute due to the polarizable continuum have been implemented using a master-slave approach with load balancing to ensure good scalability also on parallel machines with a slow interconnect. We demonstrate the good scaling behavior of the code through calculations of Hartree-Fock energies and linear, quadratic, and cubic response function for a modest-sized sample molecule. We also explore the behavior of the parallelization of the integral equation formulation of the polarizable continuum model code when used in conjunction with a recent scheme for the storage of two-electron integrals in the memory of the different slaves in order to achieve superlinear scaling in the parallel calculations.

  17. Large-scale parallel surface functionalization of goblet-type whispering gallery mode microcavity arrays for biosensing applications.

    PubMed

    Bog, Uwe; Brinkmann, Falko; Kalt, Heinz; Koos, Christian; Mappes, Timo; Hirtz, Michael; Fuchs, Harald; Köber, Sebastian

    2014-10-15

    A novel surface functionalization technique is presented for large-scale selective molecule deposition onto whispering gallery mode microgoblet cavities. The parallel technique allows damage-free individual functionalization of the cavities, arranged on-chip in densely packaged arrays. As the stamp pad a glass slide is utilized, bearing phospholipids with different functional head groups. Coated microcavities are characterized and demonstrated as biosensors.

  18. A parallel approach for image segmentation by numerical minimization of a second-order functional

    NASA Astrophysics Data System (ADS)

    Zanella, Riccardo; Zanetti, Massimo; Ruggiero, Valeria

    2016-10-01

    Because of its attractive features, image segmentation has shown to be a promising tool in remote sensing. A known drawback about its implementation is computational complexity. Recently in [1] an effcient numerical method has been proposed for the minimization of a second-order variational approximation of the Blake-Zissermann functional. The method is an especially tailored version of the block-coordinate descent algorithm (BCDA). In order to enable the segmentation of large-size gridded data, such as Digital Surface Models, we combine a domain decomposition technique with BCDA and a parallel interconnection rule among blocks of variables. We aim to show that a simple tiling strategy enables us to treat large images even in a commodity multicore CPU, with no need of specific post-processing on tiles junctions. From the point of view of the performance, little computational effort is required to separate data in subdomains and the running time is mainly spent in concurrently solving the independent subproblems. Numerical results are provided to evaluate the effectiveness of the proposed parallel approach.

  19. Functional renormalization group study of parallel double quantum dots: Effects of asymmetric dot-lead couplings

    NASA Astrophysics Data System (ADS)

    Protsenko, V. S.; Katanin, A. A.

    2017-06-01

    We explore the effects of asymmetry of hopping parameters between double parallel quantum dots and the leads on the conductance and a possibility of local magnetic moment formation in this system using functional renormalization group approach with the counterterm. We demonstrate a possibility of a quantum phase transition to a local moment regime [so-called singular Fermi liquid (SFL) state] for various types of hopping asymmetries and discuss respective gate voltage dependencies of the conductance. We show that, depending on the type of the asymmetry, the system can demonstrate either a first-order quantum phase transition to an SFL state, accompanied by a discontinuous change of the conductance, similarly to the symmetric case, or the second-order quantum phase transition, in which the conductance is continuous and exhibits Fano-type asymmetric resonance near the transition point. A semianalytical explanation of these different types of conductance behavior is presented.

  20. Parallel Loss-of-Function at the RPM1 Bacterial Resistance Locus in Arabidopsis thaliana

    PubMed Central

    Rose, Laura; Atwell, Susanna; Grant, Murray; Holub, Eric B.

    2012-01-01

    Dimorphism at the Resistance to Pseudomonas syringae pv. maculicola 1 (RPM1) locus is well documented in natural populations of Arabidopsis thaliana and has been portrayed as a long-term balanced polymorphism. The haplotype from resistant plants contains the RPM1 gene, which enables these plants to recognize at least two structurally unrelated bacterial effector proteins (AvrB and AvrRpm1) from bacterial crop pathogens. A complete deletion of the RPM1 coding sequence has been interpreted as a single event resulting in susceptibility in these individuals. Consequently, the ability to revert to resistance or for alternative R-gene specificities to evolve at this locus has also been lost in these individuals. Our survey of variation at the RPM1 locus in a large species-wide sample of A. thaliana has revealed four new loss-of-function alleles that contain most of the intervening sequence of the RPM1 open reading frame. Multiple loss-of-function alleles may have originated due to the reported intrinsic cost to plants expressing the RPM1 protein. The frequency and geographic distribution of rpm1 alleles observed in our survey indicate the parallel origin and maintenance of these loss-of-function mutations and reveal a more complex history of natural selection at this locus than previously thought. PMID:23272006

  1. Depth estimation via parallel coevolution of disparity functions for area-based stereo

    NASA Astrophysics Data System (ADS)

    Liatsis, Panos; Goulermas, John Y.

    2001-02-01

    12 A novel system for depth estimation is proposed with the use of Symbiotic Genetic Algorithms for the continuous problem of disparity surface approximation. The approach is based on the decomposition of the entire surface to very small non- overlapping patches described by low order bivariate polynomials and the use of symbiotic optimization to enforce smoothness at the boundaries of these patches, so that the entire surface can be approximated in a smooth piecewise fashion by functionals of local support. Such optimization is amenable to a massive parallel implementation, since each patch is optimized by a different execution unit and each unit communicates through its cost function only with its four-connected neighbors. The method makes use of various existing crossover and mutation schemes for real-valued chromosome representations and a new problem-specific mechanism for generating and hybridizing the initial populations. The proposed multi-objective cost function enforces photometric similarity and smoothness between the patch boundaries at a local scale, which in the long term give rise to a globally smooth disparity surface.

  2. Temporal increase in thymocyte negative selection parallels enhanced thymic SIRPα(+) DC function.

    PubMed

    Kroger, Charles J; Wang, Bo; Tisch, Roland

    2016-10-01

    Dysregulation of negative selection contributes to T-cell-mediated autoimmunity, such as type 1 diabetes. The events regulating thymic negative selection, however, are ill defined. Work by our group and others suggest that negative selection is inefficient early in ontogeny and increases with age. This study examines temporal changes in negative selection and the thymic DC compartment. Peptide-induced thymocyte deletion in vivo was reduced in newborn versus 4-week-old NOD mice, despite a similar sensitivity of the respective thymocytes to apoptosis induction. The temporal increase in negative selection corresponded with an elevated capacity of thymic antigen-presenting cells to stimulate T cells, along with altered subset composition and function of resident DC. The frequency of signal regulatory protein α(+) (SIRPα(+) ) and plasmacytoid DCs was increased concomitant with a decrease in CD8α(+) DC in 4-week-old NOD thymi. Importantly, 4-week-old versus newborn thymic SIRPα(+) DC exhibited increased antigen processing and presentation via the MHC class II but not class I pathway, coupled with an enhanced T-cell stimulatory capacity not seen in thymic plasmacytoid DC and CD8α(+) DC. These findings indicate that the efficiency of thymic DC-mediated negative selection is limited early after birth, and increases with age paralleling expansion of functionally superior thymic SIRPα(+) DC. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Functionalized Polymers-Emerging Versatile Tools for Solution-Phase Chemistry and Automated Parallel Synthesis.

    PubMed

    Kirschning, Andreas; Monenschein, Holger; Wittenberg, Rüdiger

    2001-02-16

    As part of the dramatic changes associated with the need for preparing compound libraries in pharmaceutical and agrochemical research laboratories, industry searches for new technologies that allow for the automation of synthetic processes. Since the pioneering work by Merrifield polymeric supports have been identified to play a key role in this field however, polymer-assisted solution-phase synthesis which utilizes immobilized reagents and catalysts has only recently begun to flourish. Polymer-assisted solution-phase synthesis has various advantages over conventional solution-phase chemistry, such as the ease of separation of the supported species from a reaction mixture by filtration and washing, the opportunity to use an excess of the reagent to force the reaction to completion without causing workup problems, and the adaptability to continuous-flow processes. Various strategies for employing functionalized polymers stoichiometrically have been developed. Apart from reagents that are covalently or ionically attached to the polymeric backbone and which are released into solution in the presence of a suitable substrate, scavenger reagents play an increasingly important role in purifying reaction mixtures. Employing functionalized polymers in solution-phase synthesis has been shown to be extremely useful in automated parallel synthesis and multistep sequences. So far, compound libraries containing as many as 88 members have been generated by using several polymer-bound reagents one after another. Furthermore, it has been demonstrated that complex natural products like the alkaloids (+/-)-oxomaritidine and (+/-)-epimaritidine can be prepared by a sequence of five and six consecutive polymer-assisted steps, respectively, and the potent analgesic compound (+/-)-epibatidine in twelve linear steps ten of which are based on functionalized polymers. These developments reveal the great future prospects of polymer-assisted solution-phase synthesis.

  4. Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization.

    PubMed

    Su, Xiaoquan; Pan, Weihua; Song, Baoxing; Xu, Jian; Ning, Kang

    2014-01-01

    The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples and the data size for each sample are increasing rapidly. Current metagenomic analysis is both data- and computation- intensive, especially when there are many species in a metagenomic sample, and each has a large number of sequences. As such, metagenomic analyses require extensive computational power. The increasing analytical requirements further augment the challenges for computation analysis. In this work, we have proposed Parallel-META 2.0, a metagenomic analysis software package, to cope with such needs for efficient and fast analyses of taxonomical and functional structures for microbial communities. Parallel-META 2.0 is an extended and improved version of Parallel-META 1.0, which enhances the taxonomical analysis using multiple databases, improves computation efficiency by optimized parallel computing, and supports interactive visualization of results in multiple views. Furthermore, it enables functional analysis for metagenomic samples including short-reads assembly, gene prediction and functional annotation. Therefore, it could provide accurate taxonomical and functional analyses of the metagenomic samples in high-throughput manner and on large scale.

  5. Interaction-induced local moments in parallel quantum dots within the functional renormalization group approach

    NASA Astrophysics Data System (ADS)

    Protsenko, V. S.; Katanin, A. A.

    2016-11-01

    We propose a version of the functional renormalization-group (fRG) approach, which is, due to including Litim-type cutoff and switching off (or reducing) the magnetic field during fRG flow, capable of describing a singular Fermi-liquid (SFL) phase, formed due to the presence of local moments in quantum dot structures. The proposed scheme allows one to describe the first-order quantum phase transition from the "singular" to the "regular" paramagnetic phase with applied gate voltage to parallel quantum dots, symmetrically coupled to leads, and shows sizable spin splitting of electronic states in the SFL phase in the limit of vanishing magnetic field H →0 ; the calculated conductance shows good agreement with the results of the numerical renormalization group. Using the proposed fRG approach with the counterterm, we also show that for asymmetric coupling of the leads to the dots the SFL behavior similar to that for the symmetric case persists, but with occupation numbers, effective energy levels, and conductance changing continuously through the quantum phase transition into the SFL phase.

  6. Parallel perceptual/cognitive functions in humans and rats: space and time.

    PubMed

    Tees, R C; Buhrmann, K

    1989-06-01

    The nature of the evidence on the role played by early stimulation history in perceptual development related to an appreciation of intermodal attributes involving space and time is reviewed. In conjunction with this analysis, an examination was undertaken of the effect of early visual deprivation on the ability of dark- (DR) and light-reared (LR) rats to learn discriminations involving location of sounds or lights and to abstract the intersensory correspondence involved from the initial modality-specific training. Visually inexperienced DR rats were somewhat slower to acquire a discrimination involving the location of visual events under some stimulus/response arrangements. More importantly, such animals were not as effective as their visually experienced LR counterparts in demonstrating cross-modal transfer (CMT) to signals in a new modality. The present study also revealed that CMT involving location of signals was less salient than CMT of duration information in rats regardless of their rearing condition. Finally, findings are discussed more generally, providing contextual information that bears on issues related to parallel cognitive functions in rats and human neonates and on the role of early visual experience in the ontogeny of intersensory perceptual competence in mammals.

  7. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    PubMed

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E; Muñoz-Fuentes, Violeta; Green, Andy J; Kopuchian, Cecilia; Tubaro, Pablo L; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M; Wilson, Robert E; Fago, Angela; McCracken, Kevin G; Storz, Jay F

    2015-12-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  8. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level

    PubMed Central

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E.; Muñoz-Fuentes, Violeta; Green, Andy J.; Kopuchian, Cecilia; Tubaro, Pablo L.; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M.; Wilson, Robert E.; Fago, Angela; McCracken, Kevin G.; Storz, Jay F.

    2015-01-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level. PMID:26637114

  9. Parallel phase-shifting digital holography with adaptive function using phase-mode spatial light modulator.

    PubMed

    Lin, Miao; Nitta, Kouichi; Matoba, Osamu; Awatsuji, Yasuhiro

    2012-05-10

    Parallel phase-shifting digital holography using a phase-mode spatial light modulator (SLM) is proposed. The phase-mode SLM implements spatial distribution of phase retardation required in the parallel phase-shifting digital holography. This SLM can also compensate dynamically the phase distortion caused by optical elements such as beam splitters, lenses, and air fluctuation. Experimental demonstration using a static object is presented.

  10. High-throughput optogenetic functional magnetic resonance imaging with parallel computations

    PubMed Central

    Fang, Zhongnan; Lee, Jin Hyung

    2013-01-01

    Optogenetic functional magnetic resonance imaging (ofMRI) technology enables cell-type specific, temporally precise neuronal control and accurate, in vivo readout of resulting activity across the whole brain. With the ability to precisely control excitation and inhibition parameters, and to accurately record the resulting activity, there is an increased need for a high-throughput method to bring the ofMRI studies to their full potential. In this paper, an advanced system that can allow real-time fMRI with interactive control and analysis in a fraction of the MRI acquisition repetition time (TR) is proposed. With such high processing speed, sufficient time will be available for integration of future developments that can further enhance ofMRI data quality or better streamline the study. We designed and implemented a highly optimized, massively parallel system using graphics processing unit (GPU)s which achieves reconstruction, motion correction, and analysis of 3D volume data in approximately 12.80 ms. As a result, with a 750 ms TR and 4 interleaf fMRI acquisition, we can now conduct sliding window reconstruction, motion correction, analysis and display in approximately 1.7% of the TR. Therefore, a significant amount of time can now be allocated to integrating advanced but computationally intensive methods that can enable higher image quality and better analysis results all within a TR. Utilizing the proposed high-throughput imaging platform with sliding window reconstruction, we were also able to observe the much-debated initial dips in our ofMRI data. Combined with methods to further improve SNR, the proposed system will enable efficient real-time, interactive, high-throughput ofMRI studies. PMID:23747482

  11. High-throughput optogenetic functional magnetic resonance imaging with parallel computations.

    PubMed

    Fang, Zhongnan; Lee, Jin Hyung

    2013-09-15

    Optogenetic functional magnetic resonance imaging (of MRI) technology enables cell-type-specific, temporally precise neuronal control and the accurate, in vivo readout of the resulting activity across the entire brain. With the ability to precisely control excitation and inhibition parameters and accurately record the resulting activity, there is an increased need for a high-throughput method to bring the of MRI studies to their full potential. In this paper, an advanced system facilitating real-time fMRI with interactive control and analysis in a fraction of the MRI acquisition repetition time (TR) is proposed. With high-processing speed, sufficient time will be available for the integration of future developments that further enhance of MRI data or streamline the study. We designed and implemented a highly optimised, massively parallel system using graphics processing units (GPUs), which achieves the reconstruction, motion correction, and analysis of 3D volume data in approximately 12.80 ms. As a result, with a 750 ms TR and 4 interleaf fMRI acquisition, we can now conduct sliding window reconstruction, motion correction, analysis and display in approximately 1.7% of the TR. Therefore, a significant amount of time can now be allocated to integrating advanced but computationally intensive methods that improve image quality and enhance the analysis results within a TR. Utilising the proposed high-throughput imaging platform with sliding window reconstruction, we were also able to observe the much-debated initial dips in our of MRI data. Combined with methods to further improve SNR, the proposed system will enable efficient real-time, interactive, high-throughput of MRI studies. Copyright © 2013 Elsevier B.V. All rights reserved.

  12. Energy distribution functions of kilovolt ions parallel and perpendicular to the magnetic field of a modified Penning discharge

    NASA Technical Reports Server (NTRS)

    Roth, R. J.

    1973-01-01

    The distribution function of ion energy parallel to the magnetic field of a modified Penning discharge has been measured with a retarding potential energy analyzer. These ions escaped through one of the throats of the magnetic mirror geometry. Simultaneous measurements of the ion energy distribution function perpendicular to the magnetic field have been made with a charge exchange neutral detector. The ion energy distribution functions are approximately Maxwellian, and the parallel and perpendicular kinetic temperatures are equal within experimental error. These results suggest that turbulent processes previously observed in this discharge Maxwellianize the velocity distribution along a radius in velocity space and cause an isotropic energy distribution. When the distributions depart from Maxwellian, they are enhanced above the Maxwellian tail.

  13. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.

  14. Parallel processing in the honeybee olfactory pathway: structure, function, and evolution.

    PubMed

    Rössler, Wolfgang; Brill, Martin F

    2013-11-01

    Animals face highly complex and dynamic olfactory stimuli in their natural environments, which require fast and reliable olfactory processing. Parallel processing is a common principle of sensory systems supporting this task, for example in visual and auditory systems, but its role in olfaction remained unclear. Studies in the honeybee focused on a dual olfactory pathway. Two sets of projection neurons connect glomeruli in two antennal-lobe hemilobes via lateral and medial tracts in opposite sequence with the mushroom bodies and lateral horn. Comparative studies suggest that this dual-tract circuit represents a unique adaptation in Hymenoptera. Imaging studies indicate that glomeruli in both hemilobes receive redundant sensory input. Recent simultaneous multi-unit recordings from projection neurons of both tracts revealed widely overlapping response profiles strongly indicating parallel olfactory processing. Whereas lateral-tract neurons respond fast with broad (generalistic) profiles, medial-tract neurons are odorant specific and respond slower. In analogy to "what-" and "where" subsystems in visual pathways, this suggests two parallel olfactory subsystems providing "what-" (quality) and "when" (temporal) information. Temporal response properties may support across-tract coincidence coding in higher centers. Parallel olfactory processing likely enhances perception of complex odorant mixtures to decode the diverse and dynamic olfactory world of a social insect.

  15. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    NASA Technical Reports Server (NTRS)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  16. Solvable biological evolution models with general fitness functions and multiple mutations in parallel mutation-selection scheme

    NASA Astrophysics Data System (ADS)

    Saakian, David B.; Hu, Chin-Kun; Khachatryan, H.

    2004-10-01

    In a recent paper [Phys. Rev. E 69, 046121 (2004)], we used the Suzuki-Trottere formalism to study a quasispecies biological evolution model in a parallel mutation-selection scheme with a single-peak fitness function and a point mutation. In the present paper, we extend such a study to evolution models with more general fitness functions or multiple mutations in the parallel mutation-selection scheme. We give some analytical equations to define the error thresholds for some general cases of mean-field-like or symmetric mutation schemes and fitness functions. We derive some equations for the dynamics in the case of a point mutation and polynomial fitness functions. We derive exact dynamics for two-point mutations, asymmetric mutations, and the four-value spin model with a single-peak fitness function. The same method is applied for the model with a royal road fitness function. We derive the steady-state distribution for the single-peak fitness function.

  17. Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

    NASA Technical Reports Server (NTRS)

    Wheatley, Thomas E.; Michaloski, John L.; Lumia, Ronald

    1989-01-01

    Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored.

  18. Memorability of commands learned as keywords or function keys: A parallel to voice recognition interfaces

    SciTech Connect

    Sorn, K.; Schultz, E.E. Jr.

    1987-09-15

    Voice recognition interfaces require users to input keywords to access and control functions. An experiment was conducted to compare user's memory for keywords relative to names of equivalent function keys. Thirty-five subjects attempted to learn word processing functions as keywords or in terms of the names of function keys which allowed access to and control of these functions. Keyword learning produced a significantly higher proportion of correct recalls and fewer intrusions (false recalls) after both an immediate retention test and an unexpected second test two weeks later. Superior keyword memorability is an important potential advantage of voice recognition interfaces.

  19. ALIX and ESCRT-I/II function as parallel ESCRT-III recruiters in cytokinetic abscission

    PubMed Central

    Christ, Liliane; Wenzel, Eva M.; Liestøl, Knut; Raiborg, Camilla

    2016-01-01

    Cytokinetic abscission, the final stage of cell division where the two daughter cells are separated, is mediated by the endosomal sorting complex required for transport (ESCRT) machinery. The ESCRT-III subunit CHMP4B is a key effector in abscission, whereas its paralogue, CHMP4C, is a component in the abscission checkpoint that delays abscission until chromatin is cleared from the intercellular bridge. How recruitment of these components is mediated during cytokinesis remains poorly understood, although the ESCRT-binding protein ALIX has been implicated. Here, we show that ESCRT-II and the ESCRT-II–binding ESCRT-III subunit CHMP6 cooperate with ESCRT-I to recruit CHMP4B, with ALIX providing a parallel recruitment arm. In contrast to CHMP4B, we find that recruitment of CHMP4C relies predominantly on ALIX. Accordingly, ALIX depletion leads to furrow regression in cells with chromosome bridges, a phenotype associated with abscission checkpoint signaling failure. Collectively, our work reveals a two-pronged recruitment of ESCRT-III to the cytokinetic bridge and implicates ALIX in abscission checkpoint signaling. PMID:26929449

  20. ALIX and ESCRT-I/II function as parallel ESCRT-III recruiters in cytokinetic abscission.

    PubMed

    Christ, Liliane; Wenzel, Eva M; Liestøl, Knut; Raiborg, Camilla; Campsteijn, Coen; Stenmark, Harald

    2016-02-29

    Cytokinetic abscission, the final stage of cell division where the two daughter cells are separated, is mediated by the endosomal sorting complex required for transport (ESCRT) machinery. The ESCRT-III subunit CHMP4B is a key effector in abscission, whereas its paralogue, CHMP4C, is a component in the abscission checkpoint that delays abscission until chromatin is cleared from the intercellular bridge. How recruitment of these components is mediated during cytokinesis remains poorly understood, although the ESCRT-binding protein ALIX has been implicated. Here, we show that ESCRT-II and the ESCRT-II-binding ESCRT-III subunit CHMP6 cooperate with ESCRT-I to recruit CHMP4B, with ALIX providing a parallel recruitment arm. In contrast to CHMP4B, we find that recruitment of CHMP4C relies predominantly on ALIX. Accordingly, ALIX depletion leads to furrow regression in cells with chromosome bridges, a phenotype associated with abscission checkpoint signaling failure. Collectively, our work reveals a two-pronged recruitment of ESCRT-III to the cytokinetic bridge and implicates ALIX in abscission checkpoint signaling.

  1. Corral framework: Trustworthy and fully functional data intensive parallel astronomical pipelines

    NASA Astrophysics Data System (ADS)

    Cabral, J. B.; Sánchez, B.; Beroiz, M.; Domínguez, M.; Lares, M.; Gurovich, S.; Granitto, P.

    2017-07-01

    Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python framework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of handling: custom data models; processing stages; and communication alerts, and also provides automatic quality and structural metrics based on unit testing. The Model-View-Controller provides concept separation between the user logic and the data models, delivering at the same time multi-processing and distributed computing capabilities. Corral represents an improvement over commonly found data processing pipelines in astronomysince the design pattern eases the programmer from dealing with processing flow and parallelization issues, allowing them to focus on the specific algorithms needed for the successive data transformations and at the same time provides a broad measure of quality over the created pipeline. Corral and working examples of pipelines that use it are available to the community at https://github.com/toros-astro.

  2. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers.

    PubMed

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Juhani Virta, Marko P; Alm, Eric J

    2016-02-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells.

  3. A coarse-grained model for DNA-functionalized spherical colloids, revisited: Effective pair potential from parallel replica simulations

    NASA Astrophysics Data System (ADS)

    Theodorakis, Panagiotis E.; Dellago, Christoph; Kahl, Gerhard

    2013-01-01

    We discuss a coarse-grained model recently proposed by Starr and Sciortino [J. Phys.: Condens. Matter 18, L347 (2006), 10.1088/0953-8984/18/26/L02] for spherical particles functionalized with short single DNA strands. The model incorporates two key aspects of DNA hybridization, i.e., the specificity of binding between DNA bases and the strong directionality of hydrogen bonds. Here, we calculate the effective potential between two DNA-functionalized particles of equal size using a parallel replica protocol. We find that the transition from bonded to unbonded configurations takes place at considerably lower temperatures compared to those that were originally predicted using standard simulations in the canonical ensemble. We put particular focus on DNA-decorations of tetrahedral and octahedral symmetry, as they are promising candidates for the self-assembly into a single-component diamond structure. Increasing colloid size hinders hybridization of the DNA strands, in agreement with experimental findings.

  4. Parallel Changes in Structural and Functional Measures of Optic Nerve Myelination after Optic Neuritis

    PubMed Central

    van der Walt, Anneke; Kolbe, Scott; Mitchell, Peter; Wang, Yejun; Butzkueven, Helmut; Egan, Gary; Yiannikas, Con; Graham, Stuart; Kilpatrick, Trevor; Klistorner, Alexander

    2015-01-01

    Introduction Visual evoked potential (VEP) latency prolongation and optic nerve lesion length after acute optic neuritis (ON) corresponds to the degree of demyelination, while subsequent recovery of latency may represent optic nerve remyelination. We aimed to investigate the relationship between multifocal VEP (mfVEP) latency and optic nerve lesion length after acute ON. Methods Thirty acute ON patients were studied at 1,3,6 and 12 months using mfVEP and at 1 and 12 months with optic nerve MRI. LogMAR and low contrast visual acuity were documented. By one month, the mfVEP amplitude had recovered sufficiently for latency to be measured in 23 (76.7%) patients with seven patients having no recordable mfVEP in more than 66% of segments in at least one test. Only data from these 23 patients was analysed further. Results Both latency and lesion length showed significant recovery during the follow-up period. Lesion length and mfVEP latency were highly correlated at 1 (r = 0.94, p = <0.0001) and 12 months (r = 0.75, p < 0.001). Both measures demonstrated a similar trend of recovery. Speed of latency recovery was faster in the early follow-up period while lesion length shortening remained relatively constant. At 1 month, latency delay was worse by 1.76ms for additional 1mm of lesion length while at 12 months, 1mm of lesion length accounted for 1.94ms of latency delay. Conclusion A strong association between two putative measures of demyelination in early and chronic ON was found. Parallel recovery of both measures could reflect optic nerve remyelination. PMID:26020925

  5. Parallel changes in structural and functional measures of optic nerve myelination after optic neuritis.

    PubMed

    van der Walt, Anneke; Kolbe, Scott; Mitchell, Peter; Wang, Yejun; Butzkueven, Helmut; Egan, Gary; Yiannikas, Con; Graham, Stuart; Kilpatrick, Trevor; Klistorner, Alexander

    2015-01-01

    Visual evoked potential (VEP) latency prolongation and optic nerve lesion length after acute optic neuritis (ON) corresponds to the degree of demyelination, while subsequent recovery of latency may represent optic nerve remyelination. We aimed to investigate the relationship between multifocal VEP (mfVEP) latency and optic nerve lesion length after acute ON. Thirty acute ON patients were studied at 1, 3, 6 and 12 months using mfVEP and at 1 and 12 months with optic nerve MRI. LogMAR and low contrast visual acuity were documented. By one month, the mfVEP amplitude had recovered sufficiently for latency to be measured in 23 (76.7%) patients with seven patients having no recordable mfVEP in more than 66% of segments in at least one test. Only data from these 23 patients was analysed further. Both latency and lesion length showed significant recovery during the follow-up period. Lesion length and mfVEP latency were highly correlated at 1 (r = 0.94, p = <0.0001) and 12 months (r = 0.75, p < 0.001). Both measures demonstrated a similar trend of recovery. Speed of latency recovery was faster in the early follow-up period while lesion length shortening remained relatively constant. At 1 month, latency delay was worse by 1.76 ms for additional 1mm of lesion length while at 12 months, 1mm of lesion length accounted for 1.94 ms of latency delay. A strong association between two putative measures of demyelination in early and chronic ON was found. Parallel recovery of both measures could reflect optic nerve remyelination.

  6. PROBING VERY BRIGHT END OF GALAXY LUMINOSITY FUNCTION AT z {approx}> 7 USING HUBBLE SPACE TELESCOPE PURE PARALLEL OBSERVATIONS

    SciTech Connect

    Yan Haojing; Yan Lin; Zamojski, Michel A.; Windhorst, Rogier A.; McCarthy, Patrick J.; Fan Xiaohui; Dave, Romeel; Roettgering, Huub J. A.; Koekemoer, Anton M.; Robertson, Brant E.; Cai Zheng

    2011-02-10

    We report the first results from the Hubble Infrared Pure Parallel Imaging Extragalactic Survey, which utilizes the pure parallel orbits of the Hubble Space Telescope to do deep imaging along a large number of random sightlines. To date, our analysis includes 26 widely separated fields observed by the Wide Field Camera 3, which amounts to 122.8 arcmin{sup 2} in total area. We have found three bright Y{sub 098}-dropouts, which are candidate galaxies at z {approx}> 7.4. One of these objects shows an indication of peculiar variability and its nature is uncertain. The other two objects are among the brightest candidate galaxies at these redshifts known to date (L>2L*). Such very luminous objects could be the progenitors of the high-mass Lyman break galaxies observed at lower redshifts (up to z {approx} 5). While our sample is still limited in size, it is much less subject to the uncertainty caused by 'cosmic variance' than other samples because it is derived using fields along many random sightlines. We find that the existence of the brightest candidate at z {approx} 7.4 is not well explained by the current luminosity function (LF) estimates at z {approx} 8. However, its inferred surface density could be explained by the prediction from the LFs at z {approx} 7 if it belongs to the high-redshift tail of the galaxy population at z {approx} 7.

  7. SPARC: Accurate and efficient finite-difference formulation and parallel implementation of Density Functional Theory: Isolated clusters

    NASA Astrophysics Data System (ADS)

    Ghosh, Swarnava; Suryanarayana, Phanish

    2017-03-01

    As the first component of SPARC (Simulation Package for Ab-initio Real-space Calculations), we present an accurate and efficient finite-difference formulation and parallel implementation of Density Functional Theory (DFT) for isolated clusters. Specifically, utilizing a local reformulation of the electrostatics, the Chebyshev polynomial filtered self-consistent field iteration, and a reformulation of the non-local component of the force, we develop a framework using the finite-difference representation that enables the efficient evaluation of energies and atomic forces to within the desired accuracies in DFT. Through selected examples consisting of a variety of elements, we demonstrate that SPARC obtains exponential convergence in energy and forces with domain size; systematic convergence in the energy and forces with mesh-size to reference plane-wave result at comparably high rates; forces that are consistent with the energy, both free from any noticeable 'egg-box' effect; and accurate ground-state properties including equilibrium geometries and vibrational spectra. In addition, for systems consisting up to thousands of electrons, SPARC displays weak and strong parallel scaling behavior that is similar to well-established and optimized plane-wave implementations, but with a significantly reduced prefactor. Overall, SPARC represents an attractive alternative to plane-wave codes for practical DFT simulations of isolated clusters.

  8. The functional significance of cortical reorganization and the parallel development of CI therapy.

    PubMed

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation.

  9. The functional significance of cortical reorganization and the parallel development of CI therapy

    PubMed Central

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W.

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  10. Languages for parallel architectures

    SciTech Connect

    Bakker, J.W.

    1989-01-01

    This book presents mathematical methods for modelling parallel computer architectures, based on the results of ESPRIT's project 415 on computer languages for parallel architectures. Presented are investigations incorporating a wide variety of programming styles, including functional,logic, and object-oriented paradigms. Topics cover include Philips's parallel object-oriented language POOL, lazy-functional languages, the languages IDEAL, K-LEAF, FP2, and Petri-net semantics for the AADL language.

  11. The functional and anatomical organization of marsupial neocortex: Evidence for parallel evolution across mammals

    PubMed Central

    Karlen, Sarah J.; Krubitzer, Leah

    2007-01-01

    Marsupials are a diverse group of mammals that occupy a large range of habitats and have evolved a wide array of unique adaptations. Although they are as diverse as placental mammals, our understanding of marsupial brain organization is more limited. Like placental mammals, marsupials have striking similarities in neocortical organization, such as a constellation of cortical fields including S1, S2, V1, V2, and A1, that are functionally, architectonically, and connectionally distinct. In this review, we describe the general lifestyle and morphological characteristics of all marsupials and the organization of somatosensory, motor, visual, and auditory cortex. For each sensory system, we compare the functional organization and the corticocortical and thalamocortical connections of the neocortex across species. Differences between placental and marsupial species are discussed and the theories on neocortical evolution that have been derived from studying marsupials, particularly the idea of a sensorimotor amalgam, are evaluated. Overall, marsupials inhabit a variety of niches and assume many different lifestyles. For example, marsupials occupy terrestrial, arboreal, burrowing, and aquatic environments; some animals are highly social while others are solitary; and different species are carnivorous, herbivorous, or omnivorous. For each of these adaptations, marsupials have evolved an array of morphological, behavioral, and cortical specializations that are strikingly similar to those observed in placental mammals occupying similar habitats, which indicate that there are constraints imposed on evolving nervous systems that result in recurrent solutions to similar environmental challenges. PMID:17507143

  12. The functional and anatomical organization of marsupial neocortex: evidence for parallel evolution across mammals.

    PubMed

    Karlen, Sarah J; Krubitzer, Leah

    2007-06-01

    Marsupials are a diverse group of mammals that occupy a large range of habitats and have evolved a wide array of unique adaptations. Although they are as diverse as placental mammals, our understanding of marsupial brain organization is more limited. Like placental mammals, marsupials have striking similarities in neocortical organization, such as a constellation of cortical fields including S1, S2, V1, V2, and A1, that are functionally, architectonically, and connectionally distinct. In this review, we describe the general lifestyle and morphological characteristics of all marsupials and the organization of somatosensory, motor, visual, and auditory cortex. For each sensory system, we compare the functional organization and the corticocortical and thalamocortical connections of the neocortex across species. Differences between placental and marsupial species are discussed and the theories on neocortical evolution that have been derived from studying marsupials, particularly the idea of a sensorimotor amalgam, are evaluated. Overall, marsupials inhabit a variety of niches and assume many different lifestyles. For example, marsupials occupy terrestrial, arboreal, burrowing, and aquatic environments; some animals are highly social while others are solitary; different species are carnivorous, herbivorous, or omnivorous. For each of these adaptations, marsupials have evolved an array of morphological, behavioral, and cortical specializations that are strikingly similar to those observed in placental mammals occupying similar habitats, which indicate that there are constraints imposed on evolving nervous systems that result in recurrent solutions to similar environmental challenges.

  13. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    PubMed

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  14. ITSN-1 controls vesicle recycling at the neuromuscular junction and functions in parallel with DAB-1.

    PubMed

    Wang, Wei; Bouhours, Magali; Gracheva, Elena O; Liao, Edward H; Xu, Keli; Sengar, Ameet S; Xin, Xiaofeng; Roder, John; Boone, Charles; Richmond, Janet E; Zhen, Mei; Egan, Sean E

    2008-05-01

    Intersectins (Itsn) are conserved EH and SH3 domain containing adaptor proteins. In Drosophila melanogaster, ITSN is required to regulate synaptic morphology, to facilitate efficient synaptic vesicle recycling and for viability. Here, we report our genetic analysis of Caenorhabditis elegans intersectin. In contrast to Drosophila, C. elegans itsn-1 protein null mutants are viable and display grossly normal locomotion and development. However, motor neurons in these mutants show a dramatic increase in large irregular vesicles and accumulate membrane-associated vesicles at putative endocytic hotspots, approximately 300 nm from the presynaptic density. This defect occurs precisely where endogenous ITSN-1 protein localizes in wild-type animals and is associated with a significant reduction in synaptic vesicle number and reduced frequency of endogenous synaptic events at neuromuscular junctions (NMJs). ITSN-1 forms a stable complex with EHS-1 (Eps15) and is expressed at reduced levels in ehs-1 mutants. Thus, ITSN-1 together with EHS-1, coordinate vesicle recycling at C. elegans NMJs. We also found that both itsn-1 and ehs-1 mutants show poor viability and growth in a Disabled (dab-1) null mutant background. These results show for the first time that intersectin and Eps15 proteins function in the same genetic pathway, and appear to function synergistically with the clathrin-coat-associated sorting protein, Disabled, for viability.

  15. ITSN-1 Controls Vesicle Recycling at the Neuromuscular Junction and Functions in Parallel with DAB-1

    PubMed Central

    Wang, Wei; Bouhours, Magali; Gracheva, Elena O.; Liao, Edward H.; Xu, Keli; Sengar, Ameet S.; Xin, Xiaofeng; Roder, John; Boone, Charles; Richmond, Janet E.; Zhen, Mei; Egan, Sean E.

    2013-01-01

    Intersectins (Itsn) are conserved EH and SH3 domain containing adaptor proteins. In Drosophila melanogaster, ITSN is required to regulate synaptic morphology, to facilitate efficient synaptic vesicle recycling and for viability. Here, we report our genetic analysis of Caenorhabditis elegans intersectin. In contrast to Drosophila, C. elegans itsn-1 protein null mutants are viable and display grossly normal locomotion and development. However, motor neurons in these mutants show a dramatic increase in large irregular vesicles and accumulate membrane-associated vesicles at putative endocytic hotspots, approximately 300 nm from the presynaptic density. This defect occurs precisely where endogenous ITSN-1 protein localizes in wild-type animals and is associated with a significant reduction in synaptic vesicle number and reduced frequency of endogenous synaptic events at neuromuscular junctions (NMJs). ITSN-1 forms a stable complex with EHS-1 (Eps15) and is expressed at reduced levels in ehs-1 mutants. Thus, ITSN-1 together with EHS-1, coordinate vesicle recycling at C. elegans NMJs. We also found that both itsn-1 and ehs-1 mutants show poor viability and growth in a Disabled (dab-1) null mutant background. These results show for the first time that intersectin and Eps15 proteins function in the same genetic pathway, and appear to function synergistically with the clathrin-coat-associated sorting protein, Disabled, for viability. PMID:18298590

  16. De Novo Assembly, Characterization and Functional Annotation of Pineapple Fruit Transcriptome through Massively Parallel Sequencing

    PubMed Central

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603

  17. Parallel blind deconvolution of astronomical images based on the fractal energy ratio of the image and regularization of the point spread function

    NASA Astrophysics Data System (ADS)

    Jia, Peng; Cai, Dongmei; Wang, Dong

    2014-11-01

    A parallel blind deconvolution algorithm is presented. The algorithm contains the constraints of the point spread function (PSF) derived from the physical process of the imaging. Additionally, in order to obtain an effective restored image, the fractal energy ratio is used as an evaluation criterion to estimate the quality of the image. This algorithm is fine-grained parallelized to increase the calculation speed. Results of numerical experiments and real experiments indicate that this algorithm is effective.

  18. Tablets of functionalized polystyrene beads alone and in combination with solid reagents or catalysts. Preparation and applications in parallel solution and solid phase synthesis.

    PubMed

    Ruhland, Thomas; Holm, Per; Andersen, Kim

    2003-01-01

    Pretreatment of polystyrene beads with a nonpolar organic solvent is the key for the generation of mechanically robust tablets consisting of neat functionalized polystyrene beads, both alone and in combination with solid reagents or catalysts. The novel dosing methodology provides accurately preweighed tablets in virtually any shape and size and with excellent disintegration properties, speeding up parallel solution and solid phase synthesis. The use of tablets is demonstrated in parallel Mitsunobu and acylation reactions.

  19. Parallel changes in cortical neuron biochemistry and motor function in protein-energy malnourished adult rats.

    PubMed

    Alaverdashvili, Mariam; Hackett, Mark J; Caine, Sally; Paterson, Phyllis G

    2017-04-01

    While protein-energy malnutrition in the adult has been reported to induce motor abnormalities and exaggerate motor deficits caused by stroke, it is not known if alterations in mature cortical neurons contribute to the functional deficits. Therefore, we explored if PEM in adult rats provoked changes in the biochemical profile of neurons in the forelimb and hindlimb regions of the motor cortex. Fourier transform infrared spectroscopic imaging using a synchrotron generated light source revealed for the first time altered lipid composition in neurons and subcellular domains (cytosol and nuclei) in a cortical layer and region-specific manner. This change measured by the area under the curve of the δ(CH2) band may indicate modifications in membrane fluidity. These PEM-induced biochemical changes were associated with the development of abnormalities in forelimb use and posture. The findings of this study provide a mechanism by which PEM, if not treated, could exacerbate the course of various neurological disorders and diminish treatment efficacy.

  20. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    PubMed Central

    Gudino, N.; Sonmez, M.; Yao, Z.; Baig, T.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Martens, M.; Balaban, R. S.; Hansen, M. S.; Griswold, M. A.

    2015-01-01

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B1) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  1. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    SciTech Connect

    Gudino, N.; Sonmez, M.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Balaban, R. S.; Hansen, M. S.; Yao, Z.; Baig, T.; Martens, M.; Griswold, M. A.

    2015-01-15

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B{sub 1}) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  2. Combining fMRI and SNP Data to Investigate Connections Between Brain Function and Genetics Using Parallel ICA

    PubMed Central

    Liu, Jingyu; Pearlson, Godfrey; Windemuth, Andreas; Ruano, Gualberto; Perrone-Bizzozero, Nora I.; Calhoun, Vince

    2009-01-01

    There is current interest in understanding genetic influences on both healthy and disordered brain function. We assessed brain function with functional magnetic resonance imaging (fMRI) data collected during an auditory oddball task—detecting an infrequent sound within a series of frequent sounds. Then, task-related imaging findings were utilized as potential intermediate phenotypes (endophenotypes) to investigate genomic factors derived from a single nucleotide polymorphism (SNP) array. Our target is the linkage of these genomic factors to normal/abnormal brain functionality. We explored parallel independent component analysis (paraICA) as a new method for analyzing multimodal data. The method was aimed to identify simultaneously independent components of each modality and the relationships between them. When 43 healthy controls and 20 schizophrenia patients, all Caucasian, were studied, we found a correlation of 0.38 between one fMRI component and one SNP component. This fMRI component consisted mainly of parietal lobe activations. The relevant SNP component was contributed to significantly by 10 SNPs located in genes, including those coding for the nicotinic α-7cholinergic receptor, aromatic amino acid decarboxylase, disrupted in schizophrenia 1, among others. Both fMRI and SNP components showed significant differences in loading parameters between the schizophrenia and control groups (P = 0.0006 for the fMRI component; P = 0.001 for the SNP component). In summary, we constructed a framework to identify interactions between brain functional and genetic information; our findings provide a proof-of-concept that genomic SNP factors can be investigated by using endophenotypic imaging findings in a multivariate format. PMID:18072279

  3. EUPDF: Eulerian Monte Carlo Probability Density Function Solver for Applications With Parallel Computing, Unstructured Grids, and Sprays

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    1998-01-01

    The success of any solution methodology used in the study of gas-turbine combustor flows depends a great deal on how well it can model the various complex and rate controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as convective and radiative heat transfer and other phenomena. The phenomena to be modeled, which are controlled by these processes, often strongly interact with each other at different times and locations. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and evaporation in many practical combustion devices. The influence of turbulence in a diffusion flame manifests itself in several forms, ranging from the so-called wrinkled, or stretched, flamelets regime to the distributed combustion regime, depending upon how turbulence interacts with various flame scales. Conventional turbulence models have difficulty treating highly nonlinear reaction rates. A solution procedure based on the composition joint probability density function (PDF) approach holds the promise of modeling various important combustion phenomena relevant to practical combustion devices (such as extinction, blowoff limits, and emissions predictions) because it can account for nonlinear chemical reaction rates without making approximations. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on the PDF method to unstructured grids, parallel computing, and sprays. EUPDF, which was developed by M.S. Raju of Nyma, Inc., was designed to be massively parallel and could easily be coupled with any existing gas-phase and/or spray solvers. EUPDF can use an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements. The application of the PDF method showed favorable results when applied to several supersonic

  4. EUPDF: Eulerian Monte Carlo Probability Density Function Solver for Applications With Parallel Computing, Unstructured Grids, and Sprays

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    1998-01-01

    The success of any solution methodology used in the study of gas-turbine combustor flows depends a great deal on how well it can model the various complex and rate controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as convective and radiative heat transfer and other phenomena. The phenomena to be modeled, which are controlled by these processes, often strongly interact with each other at different times and locations. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and evaporation in many practical combustion devices. The influence of turbulence in a diffusion flame manifests itself in several forms, ranging from the so-called wrinkled, or stretched, flamelets regime to the distributed combustion regime, depending upon how turbulence interacts with various flame scales. Conventional turbulence models have difficulty treating highly nonlinear reaction rates. A solution procedure based on the composition joint probability density function (PDF) approach holds the promise of modeling various important combustion phenomena relevant to practical combustion devices (such as extinction, blowoff limits, and emissions predictions) because it can account for nonlinear chemical reaction rates without making approximations. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on the PDF method to unstructured grids, parallel computing, and sprays. EUPDF, which was developed by M.S. Raju of Nyma, Inc., was designed to be massively parallel and could easily be coupled with any existing gas-phase and/or spray solvers. EUPDF can use an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements. The application of the PDF method showed favorable results when applied to several supersonic

  5. Super-resolution non-parametric deconvolution in modelling the radial response function of a parallel plate ionization chamber.

    PubMed

    Kulmala, A; Tenhunen, M

    2012-11-07

    The signal of the dosimetric detector is generally dependent on the shape and size of the sensitive volume of the detector. In order to optimize the performance of the detector and reliability of the output signal the effect of the detector size should be corrected or, at least, taken into account. The response of the detector can be modelled using the convolution theorem that connects the system input (actual dose), output (measured result) and the effect of the detector (response function) by a linear convolution operator. We have developed the super-resolution and non-parametric deconvolution method for determination of the cylinder symmetric ionization chamber radial response function. We have demonstrated that the presented deconvolution method is able to determine the radial response for the Roos parallel plate ionization chamber with a better than 0.5 mm correspondence with the physical measures of the chamber. In addition, the performance of the method was proved by the excellent agreement between the output factors of the stereotactic conical collimators (4-20 mm diameter) measured by the Roos chamber, where the detector size is larger than the measured field, and the reference detector (diode). The presented deconvolution method has a potential in providing reference data for more accurate physical models of the ionization chamber as well as for improving and enhancing the performance of the detectors in specific dosimetric problems.

  6. Less is more: Independent loss-of-function OCIMENE SYNTHASE alleles parallel pollination syndrome diversification in monkeyflowers (Mimulus).

    PubMed

    Peng, Foen; Byers, Kelsey J R P; Bradshaw, Harvey D

    2017-07-19

    Pollinator-mediated selection on flower phenotypes (e.g., shape, color, scent) is key to understanding the adaptive radiation of angiosperms, many of which have evolved specialized relationships with a particular guild of animal pollinators (e.g., birds, bats, moths, bees). E-β-Ocimene, a monoterpene produced by OCIMENE SYNTHASE (OS) in Mimulus lewisii, is a floral scent important in attracting the species' bumblebee pollinators. The taxa closely related to M. lewisii have evolved several different pollination syndromes, including hummingbird pollination and self pollination (autogamy). We are interested in how floral scent variation contributed to species diversification in this clade. We analyzed variation in E-β-ocimene emission within this Mimulus clade and explored its molecular basis through a combination of DNA sequencing, reverse transcriptase PCR, and enzyme functional analysis in vitro. We found that none of the taxa, other than M. lewisii, emitted E-β-ocimene from flowers. But the molecular basis underlying loss of E-β-ocimene emission is unique in each taxon, including deletion, missense, or frameshift mutations in the OS gene, and potential posttranscriptional downregulation. The molecular evidence suggests that parallel loss-of-function in OS is the best explanation for the observed pattern of E-β-ocimene emission, likely as the result of natural selection. © 2017 Botanical Society of America.

  7. Independent parallel functions of p19 plant viral suppressor of RNA silencing required for effective suppressor activity

    PubMed Central

    Várallyay, Éva; Oláh, Enikő; Havelda, Zoltán

    2014-01-01

    Plant viruses ubiquitously mediate the induction of miR168 trough the activities of viral suppressors of RNA silencing (VSRs) controlling the accumulation of ARGONAUTE1 (AGO1), one of the main components of RNA silencing based host defence system. Here we used a mutant Tombusvirus p19 VSR (p19-3M) disabled in its main suppressor function, small interfering RNA (siRNA) binding, to investigate the biological role of VSR-mediated miR168 induction. Infection with the mutant virus carrying p19-3M VSR resulted in suppressed recovery phenotype despite the presence of free virus specific siRNAs. Analysis of the infected plants revealed that the mutant p19-3M VSR is able to induce miR168 level controlling the accumulation of the antiviral AGO1, and this activity is associated with the enhanced accumulation of viral RNAs. Moreover, saturation of the siRNA-binding capacity of p19 VSR mediated by defective interfering RNAs did not influence the miR168-inducing activity. Our data indicate that p19 VSR possesses two independent silencing suppressor functions, viral siRNA binding and the miR168-mediated AGO1 control, both of which are required to efficiently cope with the RNA-silencing based host defence. This finding suggests that p19 VSR protein evolved independent parallel capacities to block the host defence at multiple levels. PMID:24062160

  8. Independent parallel functions of p19 plant viral suppressor of RNA silencing required for effective suppressor activity.

    PubMed

    Várallyay, Éva; Oláh, Eniko; Havelda, Zoltán

    2014-01-01

    Plant viruses ubiquitously mediate the induction of miR168 trough the activities of viral suppressors of RNA silencing (VSRs) controlling the accumulation of ARGONAUTE1 (AGO1), one of the main components of RNA silencing based host defence system. Here we used a mutant Tombusvirus p19 VSR (p19-3M) disabled in its main suppressor function, small interfering RNA (siRNA) binding, to investigate the biological role of VSR-mediated miR168 induction. Infection with the mutant virus carrying p19-3M VSR resulted in suppressed recovery phenotype despite the presence of free virus specific siRNAs. Analysis of the infected plants revealed that the mutant p19-3M VSR is able to induce miR168 level controlling the accumulation of the antiviral AGO1, and this activity is associated with the enhanced accumulation of viral RNAs. Moreover, saturation of the siRNA-binding capacity of p19 VSR mediated by defective interfering RNAs did not influence the miR168-inducing activity. Our data indicate that p19 VSR possesses two independent silencing suppressor functions, viral siRNA binding and the miR168-mediated AGO1 control, both of which are required to efficiently cope with the RNA-silencing based host defence. This finding suggests that p19 VSR protein evolved independent parallel capacities to block the host defence at multiple levels.

  9. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    NASA Astrophysics Data System (ADS)

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips.

  10. Postsynaptic inositol 1,4,5-trisphosphate signaling maintains presynaptic function of parallel fiber–Purkinje cell synapses via BDNF

    PubMed Central

    Furutani, Kazuharu; Okubo, Yohei; Kakizawa, Sho; Iino, Masamitsu

    2006-01-01

    The maintenance of synaptic functions is essential for neuronal information processing, but cellular mechanisms that maintain synapses in the adult brain are not well understood. Here, we report an activity-dependent maintenance mechanism of parallel fiber (PF)–Purkinje cell (PC) synapses in the cerebellum. When postsynaptic metabotropic glutamate receptor (mGluR) or inositol 1,4,5-trisphosphate (IP3) signaling was chronically inhibited in vivo, PF–PC synaptic strength decreased because of a decreased transmitter release probability. The same effects were observed when PF activity was inhibited in vivo by the suppression of NMDA receptor-mediated inputs to granule cells. PF–PC synaptic strength similarly decreased after the in vivo application of an antibody against brain-derived neurotrophic factor (BDNF). Furthermore, the weakening of synaptic connection caused by the blockade of mGluR–IP3 signaling was reversed by the in vivo application of BDNF. These results indicate that a signaling cascade comprising PF activity, postsynaptic mGluR–IP3 signaling and subsequent BDNF signaling maintains presynaptic functions in the mature cerebellum. PMID:16709674

  11. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    PubMed Central

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  12. A Three-way Parallel ICA Approach to Analyze Links among Genetics, Brain Structure and Brain Function

    PubMed Central

    Vergara, Victor M.; Ulloa, Alvaro; Calhoun, Vince D.; Boutte, David; Chen, Jiayu; Liu, Jingyu

    2014-01-01

    Multi-modal data analysis techniques, such as the Parallel Independent Component Analysis (pICA), are essential in neuroscience, medical imaging and genetic studies. The pICA algorithm allows the simultaneous decomposition of up to two data modalities achieving better performance than separate ICA decompositions and enabling the discovery of links between modalities. However, advances in data acquisition techniques facilitate the collection of more than two data modalities from each subject. Examples of commonly measured modalities include genetic information, structural magnetic resonance imaging (MRI) and functional MRI. In order to take full advantage of the available data, this work extends the pICA approach to incorporate three modalities in one comprehensive analysis. Simulations demonstrate the three-way pICA performance in identifying pairwise links between modalities and estimating independent components which more closely resemble the true sources than components found by pICA or separate ICA analyses. In addition, the three-way pICA algorithm is applied to real experimental data obtained from a study that investigate genetic effects on alcohol dependence. Considered data modalities include functional MRI (contrast images during alcohol exposure paradigm), gray matter concentration images from structural MRI and genetic single nucleotide polymorphism (SNP). The three-way pICA approach identified links between a SNP component (pointing to brain function and mental disorder associated genes, including BDNF, GRIN2B and NRG1), a functional component related to increased activation in the precuneus area, and a gray matter component comprising part of the default mode network and the caudate. Although such findings need further verification, the simulation and in-vivo results validate the three-way pICA algorithm presented here as a useful tool in biomedical data fusion applications. PMID:24795156

  13. A three-way parallel ICA approach to analyze links among genetics, brain structure and brain function.

    PubMed

    Vergara, Victor M; Ulloa, Alvaro; Calhoun, Vince D; Boutte, David; Chen, Jiayu; Liu, Jingyu

    2014-09-01

    Multi-modal data analysis techniques, such as the Parallel Independent Component Analysis (pICA), are essential in neuroscience, medical imaging and genetic studies. The pICA algorithm allows the simultaneous decomposition of up to two data modalities achieving better performance than separate ICA decompositions and enabling the discovery of links between modalities. However, advances in data acquisition techniques facilitate the collection of more than two data modalities from each subject. Examples of commonly measured modalities include genetic information, structural magnetic resonance imaging (MRI) and functional MRI. In order to take full advantage of the available data, this work extends the pICA approach to incorporate three modalities in one comprehensive analysis. Simulations demonstrate the three-way pICA performance in identifying pairwise links between modalities and estimating independent components which more closely resemble the true sources than components found by pICA or separate ICA analyses. In addition, the three-way pICA algorithm is applied to real experimental data obtained from a study that investigate genetic effects on alcohol dependence. Considered data modalities include functional MRI (contrast images during alcohol exposure paradigm), gray matter concentration images from structural MRI and genetic single nucleotide polymorphism (SNP). The three-way pICA approach identified links between a SNP component (pointing to brain function and mental disorder associated genes, including BDNF, GRIN2B and NRG1), a functional component related to increased activation in the precuneus area, and a gray matter component comprising part of the default mode network and the caudate. Although such findings need further verification, the simulation and in-vivo results validate the three-way pICA algorithm presented here as a useful tool in biomedical data fusion applications.

  14. Parallel Functional Activity Profiling Reveals Valvulopathogens Are Potent 5-Hydroxytryptamine2B Receptor Agonists: Implications for Drug Safety Assessment

    PubMed Central

    Huang, Xi-Ping; Setola, Vincent; Yadav, Prem N.; Allen, John A.; Rogan, Sarah C.; Hanson, Bonnie J.; Revankar, Chetana; Robers, Matt; Doucette, Chris

    2009-01-01

    Drug-induced valvular heart disease (VHD) is a serious side effect of a few medications, including some that are on the market. Pharmacological studies of VHD-associated medications (e.g., fenfluramine, pergolide, methysergide, and cabergoline) have revealed that they and/or their metabolites are potent 5-hydroxytryptamine2B (5-HT2B) receptor agonists. We have shown that activation of 5-HT2B receptors on human heart valve interstitial cells in vitro induces a proliferative response reminiscent of the fibrosis that typifies VHD. To identify current or future drugs that might induce VHD, we screened approximately 2200 U.S. Food and Drug Administration (FDA)-approved or investigational medications to identify 5-HT2B receptor agonists, using calcium-based high-throughput screening. Of these 2200 compounds, 27 were 5-HT2B receptor agonists (hits); 14 of these had previously been identified as 5-HT2B receptor agonists, including seven bona fide valvulopathogens. Six of the hits (guanfacine, quinidine, xylometazoline, oxymetazoline, fenoldopam, and ropinirole) are approved medications. Twenty-three of the hits were then “functionally profiled” (i.e., assayed in parallel for 5-HT2B receptor agonism using multiple readouts to test for functional selectivity). In these assays, the known valvulopathogens were efficacious at concentrations as low as 30 nM, whereas the other compounds were less so. Hierarchical clustering analysis of the pEC50 data revealed that ropinirole (which is not associated with valvulopathy) was clearly segregated from known valvulopathogens. Taken together, our data demonstrate that patterns of 5-HT2B receptor functional selectivity might be useful for identifying compounds likely to induce valvular heart disease. PMID:19570945

  15. Functional Traits in Parallel Evolutionary Radiations and Trait-Environment Associations in the Cape Floristic Region of South Africa.

    PubMed

    Mitchell, Nora; Moore, Timothy E; Mollmann, Hayley Kilroy; Carlson, Jane E; Mocko, Kerri; Martinez-Cabrera, Hugo; Adams, Christopher; Silander, John A; Jones, Cynthia S; Schlichting, Carl D; Holsinger, Kent E

    2015-04-01

    Evolutionary radiations with extreme levels of diversity present a unique opportunity to study the role of the environment in plant evolution. If environmental adaptation played an important role in such radiations, we expect to find associations between functional traits and key climatic variables. Similar trait-environment associations across clades may reflect common responses, while contradictory associations may suggest lineage-specific adaptations. Here, we explore trait-environment relationships in two evolutionary radiations in the fynbos biome of the highly biodiverse Cape Floristic Region (CFR) of South Africa. Protea and Pelargonium are morphologically and evolutionarily diverse genera that typify the CFR yet are substantially different in growth form and morphology. Our analytical approach employs a Bayesian multiple-response generalized linear mixed-effects model, taking into account covariation among traits and controlling for phylogenetic relationships. Of the pairwise trait-environment associations tested, 6 out of 24 were in the same direction and 2 out of 24 were in opposite directions, with the latter apparently reflecting alternative life-history strategies. These findings demonstrate that trait diversity within two plant lineages may reflect both parallel and idiosyncratic responses to the environment, rather than all taxa conforming to a global-scale pattern. Such insights are essential for understanding how trait-environment associations arise and how they influence species diversification.

  16. Parallelization and Improvements of the Generalized Born Model with a Simple sWitching Function for Modern Graphics Processors

    PubMed Central

    Arthur, Evan J.; Brooks, Charles L.

    2016-01-01

    Two fundamental challenges of simulating biologically relevant systems are the rapid calculation of the energy of solvation, and the trajectory length of a given simulation. The Generalized Born model with a Simple sWitching function (GBSW) addresses these issues by using an efficient approximation of Poisson–Boltzmann (PB) theory to calculate each solute atom's free energy of solvation, the gradient of this potential, and the subsequent forces of solvation without the need for explicit solvent molecules. This study presents a parallel refactoring of the original GBSW algorithm and its implementation on newly available, low cost graphics chips with thousands of processing cores. Depending on the system size and nonbonded force cutoffs, the new GBSW algorithm offers speed increases of between one and two orders of magnitude over previous implementations while maintaining similar levels of accuracy. We find that much of the algorithm scales linearly with an increase of system size, which makes this water model cost effective for solvating large systems. Additionally, we utilize our GPU-accelerated GBSW model to fold the model system chignolin, and in doing so we demonstrate that these speed enhancements now make accessible folding studies of peptides and potentially small proteins. PMID:26786647

  17. Kinematic Modeling and Function Generation for Non-linear Curves Using 5R Double Arm Parallel Manipulator

    NASA Astrophysics Data System (ADS)

    Keshavkumar Kamaliya, Parth; Patel, Yashavant Kumar Dashrathlal

    2016-01-01

    Double arm configuration using parallel manipulator mimic the human arm motions either for planar or spatial space. These configurations are currently lucrative for researchers as it also replaces human workers without major redesign of work-place in industries. Humans' joint ranges limitation of arms can be resolved by replacement of either revolute or spherical joints in manipulator. Hence, the scope of maximum workspace utilization is prevailed. Planar configuration with five revolute joints (5R) is considered to imitate human arm motions in a plane using Double Arm Manipulator (DAM). Position analysis for tool that can be held in end links of configuration is carried out using Pro/mechanism in Creo® as well as SimMechanics. D-H parameters are formulated and its results derived using developed MATLAB programs are compared with mechanism simulation as well as SimMechanics results. Inverse kinematics model is developed for trajectory planning in order to trace tool trajectory in a continuous and smooth sequence. Polynomial functions are derived for position, velocity and acceleration for linear and non-linear curves in joint space. Analytical results obtained for trajectory planning are validated with simulation results of Creo®.

  18. SPARC: Accurate and efficient finite-difference formulation and parallel implementation of Density Functional Theory: Extended systems

    NASA Astrophysics Data System (ADS)

    Ghosh, Swarnava; Suryanarayana, Phanish

    2017-07-01

    As the second component of SPARC (Simulation Package for Ab-initio Real-space Calculations), we present an accurate and efficient finite-difference formulation and parallel implementation of Density Functional Theory (DFT) for extended systems. Specifically, employing a local formulation of the electrostatics, the Chebyshev polynomial filtered self-consistent field iteration, and a reformulation of the non-local force component, we develop a finite-difference framework wherein both the energy and atomic forces can be efficiently calculated to within desired accuracies in DFT. We demonstrate using a wide variety of materials systems that SPARC achieves high convergence rates in energy and forces with respect to spatial discretization to reference plane-wave result; exponential convergence in energies and forces with respect to vacuum size for slabs and wires; energies and forces that are consistent and display negligible 'egg-box' effect; accurate properties of crystals, slabs, and wires; and negligible drift in molecular dynamics simulations. We also demonstrate that the weak and strong scaling behavior of SPARC is similar to well-established and optimized plane-wave implementations for systems consisting up to thousands of electrons, but with a significantly reduced prefactor. Overall, SPARC represents an attractive alternative to plane-wave codes for performing DFT simulations of extended systems.

  19. Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

    NASA Astrophysics Data System (ADS)

    Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

    2010-12-01

    Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer

  20. Casimir amplitudes and capillary condensation of near-critical fluids between parallel plates: renormalized local functional theory.

    PubMed

    Okamoto, Ryuichi; Onuki, Akira

    2012-03-21

    We investigate the critical behavior of a near-critical fluid confined between two parallel plates in contact with a reservoir by calculating the order parameter profile and the Casimir amplitudes (for the force density and for the grand potential). Our results are applicable to one-component fluids and binary mixtures. We assume that the walls absorb one of the fluid components selectively for binary mixtures. We propose a renormalized local functional theory accounting for the fluctuation effects. Analysis is performed in the plane of the temperature T and the order parameter in the reservoir ψ(∞). Our theory is universal if the physical quantities are scaled appropriately. If the component favored by the walls is slightly poor in the reservoir, there appears a line of first-order phase transition of capillary condensation outside the bulk coexistence curve. The excess adsorption changes discontinuously between condensed and noncondensed states at the transition. With increasing T, the transition line ends at a capillary critical point T=T(c) (ca) slightly lower than the bulk critical temperature T(c) for the upper critical solution temperature. The Casimir amplitudes are larger than their critical point values by 10-100 times at off-critical compositions near the capillary condensation line.

  1. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  2. Parallel computation

    NASA Astrophysics Data System (ADS)

    Huberman, Bernardo A.

    1989-11-01

    This paper reviews three different aspects of parallel computation which are useful for physics. The first part deals with special architectures for parallel computing (SIMD and MIMD machines) and their differences, with examples of their uses. The second section discusses the speedup that can be achieved in parallel computation and the constraints generated by the issues of communication and synchrony. The third part describes computation by distributed networks of powerful workstations without global controls and the issues involved in understanding their behavior.

  3. Cobtorin target analysis reveals that pectin functions in the deposition of cellulose microfibrils in parallel with cortical microtubules.

    PubMed

    Yoneda, Arata; Ito, Takuya; Higaki, Takumi; Kutsuna, Natsumaro; Saito, Tamio; Ishimizu, Takeshi; Osada, Hiroyuki; Hasezawa, Seiichiro; Matsui, Minami; Demura, Taku

    2010-11-01

    Cellulose and pectin are major components of primary cell walls in plants, and it is believed that their mechanical properties are important for cell morphogenesis. It has been hypothesized that cortical microtubules guide the movement of cellulose microfibril synthase in a direction parallel with the microtubules, but the mechanism by which this alignment occurs remains unclear. We have previously identified cobtorin as an inhibitor that perturbs the parallel relationship between cortical microtubules and nascent cellulose microfibrils. In this study, we searched for the protein target of cobtorin, and we found that overexpression of pectin methylesterase and polygalacturonase suppressed the cobtorin-induced cell-swelling phenotype. Furthermore, treatment with polygalacturonase restored the deposition of cellulose microfibrils in the direction parallel with cortical microtubules, and cobtorin perturbed the distribution of methylated pectin. These results suggest that control over the properties of pectin is important for the deposition of cellulose microfibrils and/or the maintenance of their orientation parallel with the cortical microtubules.

  4. Parallel machines: Parallel machine languages

    SciTech Connect

    Iannucci, R.A. )

    1990-01-01

    This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.

  5. Gem-1 encodes an SLC16 monocarboxylate transporter-related protein that functions in parallel to the gon-2 TRPM channel during gonad development in Caenorhabditis elegans.

    PubMed

    Kemp, Benedict J; Church, Diane L; Hatzold, Julia; Conradt, Barbara; Lambie, Eric J

    2009-02-01

    The gon-2 gene of Caenorhabditis elegans encodes a TRPM cation channel required for gonadal cell divisions. In this article, we demonstrate that the gonadogenesis defects of gon-2 loss-of-function mutants (including a null allele) can be suppressed by gain-of-function mutations in the gem-1 (gon-2 extragenic modifier) locus. gem-1 encodes a multipass transmembrane protein that is similar to SLC16 family monocarboxylate transporters. Inactivation of gem-1 enhances the gonadogenesis defects of gon-2 hypomorphic mutations, suggesting that these two genes probably act in parallel to promote gonadal cell divisions. GEM-1GFP is expressed within the gonadal precursor cells and localizes to the plasma membrane. Therefore, we propose that GEM-1 acts in parallel to the GON-2 channel to promote cation uptake within the developing gonad.

  6. Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

    PubMed

    Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

    2016-08-05

    The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc.

  7. Parallel pipelining

    SciTech Connect

    Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.

    1995-09-01

    In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.

  8. [High-resolution functional cardiac MR imaging using density-weighted real-time acquisition and a combination of compressed sensing and parallel imaging for image reconstruction].

    PubMed

    Wech, T; Gutberlet, M; Greiser, A; Stäb, D; Ritter, C O; Beer, M; Hahn, D; Köstler, H

    2010-08-01

    The aim of this study was to perform high-resolution functional MR imaging using accelerated density-weighted real-time acquisition (DE) and a combination of compressed sensing (CO) and parallel imaging for image reconstruction. Measurements were performed on a 3 T whole-body system equipped with a dedicated 32-channel body array coil. A one-dimensional density-weighted spin warp technique was used, i. e. non-equidistant phase encoding steps were acquired. The two acceleration techniques, compressed sensing and parallel imaging, were performed subsequently. From a complete Cartesian k-space, a four-fold uniformly undersampled k-space was created. In addition, each undersampled time frame was further undersampled by an additional acceleration factor of 2.1 using an individual density-weighted undersampling pattern for each time frame. Simulations were performed using data of a conventional human in-vivo cine examination and in-vivo measurements of the human heart were carried out employing an adapted real-time sequence. High-quality DECO real-time images using parallel acquisition of the function of the human heart could be acquired. An acceleration factor of 8.4 could be achieved making it possible to maintain the high spatial and temporal resolution without significant noise enhancement. DECO parallel imaging facilitates high acceleration factors, which allows real-time MR acquisition of the heart dynamics and function with an image quality comparable to that conventionally achieved with clinically established triggered cine imaging. Georg Thieme Verlag KG Stuttgart, New York.

  9. rasterEngine: an easy-to-use R function for applying complex geostatistical models to raster datasets in a parallel computing environment

    NASA Astrophysics Data System (ADS)

    Greenberg, J. A.

    2013-12-01

    As geospatial analyses progress in tandem with increasing availability of large complex geographic data sets and high performance computing (HPC), there is an increasing gap in the ability of end-user tools to take advantage of these advances. Specifically, the practical implementation of complex statistical models on large gridded geographic datasets (e.g. remote sensing analysis, species distribution mapping, topographic transformations, and local neighborhood analyses) currently requires a significant knowledge base. A user must be proficient in the chosen model as well as the nuances of scientific programming, raster data models, memory management, parallel computing, and system design. This is further complicated by the fact that many of the cutting-edge analytical tools were developed for non-geospatial datasets and are not part of standard GIS packages, but are available in scientific computing languages such as R and MATLAB. We present a computing function 'rasterEngine' written in the R scientific computing language and part of the CRAN package 'spatial.tools' with these challenges in mind. The goal of rasterEngine is to allow a user to quickly develop and apply analytical models within the R computing environment to arbitrarily large gridded datasets, taking advantage of available parallel computing resources, and without requiring a deep understanding of HPC and raster data models. We provide several examples of rasterEngine being used to solve common grid based analyses, including remote sensing image analyses, topographic transformations, and species distribution modeling. With each example, the parallel processing performance results are presented.

  10. Analysis of speedup as function of block size and cluster size for parallel feed-forward neural networks on a Beowulf cluster.

    PubMed

    Mörchen, Fabian

    2004-03-01

    The performance of feed-forward neural networks trained with the backpropagation algorithm on a dedicated Beowulf cluster is analyzed. The concept of training set parallelism is applied. A new model for run time and speedup prediction is developed. With the model the speedup and efficiency of one iteration of the neural networks can be estimated as a function of block size and cluster size. The model is applied to three example problems representing different applications and network architectures. The estimation of the model has a higher accuracy than traditional methods for run time estimation and can be efficiently calculated. Experiments show that speedup of one iteration does not necessarily translate to a shorter training time toward a given error level. To overcome this problem a heuristic extension to training set parallelism called weight averaging is developed. The results show that training in parallel should only be done on clusters with high performance network connections or a multiprocessor machine. A rule of thumb is given for how much network performance of the cluster is needed to achieve speedup of the training time for a neural network.

  11. Parallel Total Energy

    SciTech Connect

    Wang, Lin-Wang

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  12. toca-1 is in a novel pathway that functions in parallel with a SUN-KASH nuclear envelope bridge to move nuclei in Caenorhabditis elegans.

    PubMed

    Chang, Yu-Tai; Dranow, Daniel; Kuhn, Jonathan; Meyerzon, Marina; Ngo, Minh; Ratner, Dmitry; Warltier, Karin; Starr, Daniel A

    2013-01-01

    Moving the nucleus to an intracellular location is critical to many fundamental cell and developmental processes, including cell migration, differentiation, fertilization, and establishment of cellular polarity. Bridges of SUN and KASH proteins span the nuclear envelope and mediate many nuclear positioning events, but other pathways function independently through poorly characterized mechanisms. To identify and characterize novel mechanisms of nuclear migration, we conducted a nonbiased forward genetic screen for mutations that enhanced the nuclear migration defect of unc-84, which encodes a SUN protein. In Caenorhabditis elegans larvae, failure of hypodermal P-cell nuclear migration results in uncoordinated and egg-laying-defective animals. The process of P-cell nuclear migration in unc-84 null animals is temperature sensitive; at 25° migration fails in unc-84 mutants, but at 15° the migration occurs normally. We hypothesized that an additional pathway functions in parallel to the unc-84 pathway to move P-cell nuclei at 15°. In support of our hypothesis, forward genetic screens isolated eight emu (enhancer of the nuclear migration defect of unc-84) mutations that disrupt nuclear migration only in a null unc-84 background. The yc20 mutant was determined to carry a mutation in the toca-1 gene. TOCA-1 functions to move P-cell nuclei in a cell-autonomous manner. TOCA-1 is conserved in humans, where it functions to nucleate and organize actin during endocytosis. Therefore, we have uncovered a player in a previously unknown, likely actin-dependent, pathway that functions to move nuclei in parallel to SUN-KASH bridges. The other emu mutations potentially represent other components of this novel pathway.

  13. Ion parallel closures

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Lee, Hankyu Q.; Held, Eric D.

    2017-02-01

    Ion parallel closures are obtained for arbitrary atomic weights and charge numbers. For arbitrary collisionality, the heat flow and viscosity are expressed as kernel-weighted integrals of the temperature and flow-velocity gradients. Simple, fitted kernel functions are obtained from the 1600 parallel moment solution and the asymptotic behavior in the collisionless limit. The fitted kernel parameters are tabulated for various temperature ratios of ions to electrons. The closures can be used conveniently without solving the kinetic equation or higher order moment equations in closing ion fluid equations.

  14. Left Ventricular Function Evaluation on a 3T MR Scanner with Parallel RF Transmission Technique: Prospective Comparison of Cine Sequences Acquired before and after Gadolinium Injection.

    PubMed

    Caspar, Thibault; Schultz, Anthony; Schaeffer, Mickaël; Labani, Aïssam; Jeung, Mi-Young; Jurgens, Paul Thomas; El Ghannudi, Soraya; Roy, Catherine; Ohana, Mickaël

    To compare cine MR b-TFE sequences acquired before and after gadolinium injection, on a 3T scanner with a parallel RF transmission technique in order to potentially improve scanning time efficiency when evaluating LV function. 25 consecutive patients scheduled for a cardiac MRI were prospectively included and had their b-TFE cine sequences acquired before and right after gadobutrol injection. Images were assessed qualitatively (overall image quality, LV edge sharpness, artifacts and LV wall motion) and quantitatively with measurement of LVEF, LV mass, and telediastolic volume and contrast-to-noise ratio (CNR) between the myocardium and the cardiac chamber. Statistical analysis was conducted using a Bayesian paradigm. No difference was found before or after injection for the LVEF, LV mass and telediastolic volume evaluations. Overall image quality and CNR were significantly lower after injection (estimated coefficient cine after > cine before gadolinium: -1.75 CI = [-3.78;-0.0305], prob(coef>0) = 0% and -0.23 CI = [-0.49;0.04], prob(coef>0) = 4%) respectively), but this decrease did not affect the visual assessment of LV wall motion (cine after > cine before gadolinium: -1.46 CI = [-4.72;1.13], prob(coef>0) = 15%). In 3T cardiac MRI acquired with parallel RF transmission technique, qualitative and quantitative assessment of LV function can reliably be performed with cine sequences acquired after gadolinium injection, despite a significant decrease in the CNR and the overall image quality.

  15. Left Ventricular Function Evaluation on a 3T MR Scanner with Parallel RF Transmission Technique: Prospective Comparison of Cine Sequences Acquired before and after Gadolinium Injection

    PubMed Central

    Caspar, Thibault; Schultz, Anthony; Schaeffer, Mickaël; Labani, Aïssam; Jeung, Mi-Young; Jurgens, Paul Thomas; El Ghannudi, Soraya; Roy, Catherine; Ohana, Mickaël

    2016-01-01

    Objectives To compare cine MR b-TFE sequences acquired before and after gadolinium injection, on a 3T scanner with a parallel RF transmission technique in order to potentially improve scanning time efficiency when evaluating LV function. Methods 25 consecutive patients scheduled for a cardiac MRI were prospectively included and had their b-TFE cine sequences acquired before and right after gadobutrol injection. Images were assessed qualitatively (overall image quality, LV edge sharpness, artifacts and LV wall motion) and quantitatively with measurement of LVEF, LV mass, and telediastolic volume and contrast-to-noise ratio (CNR) between the myocardium and the cardiac chamber. Statistical analysis was conducted using a Bayesian paradigm. Results No difference was found before or after injection for the LVEF, LV mass and telediastolic volume evaluations. Overall image quality and CNR were significantly lower after injection (estimated coefficient cine after > cine before gadolinium: -1.75 CI = [-3.78;-0.0305], prob(coef>0) = 0% and -0.23 CI = [-0.49;0.04], prob(coef>0) = 4%) respectively), but this decrease did not affect the visual assessment of LV wall motion (cine after > cine before gadolinium: -1.46 CI = [-4.72;1.13], prob(coef>0) = 15%). Conclusions In 3T cardiac MRI acquired with parallel RF transmission technique, qualitative and quantitative assessment of LV function can reliably be performed with cine sequences acquired after gadolinium injection, despite a significant decrease in the CNR and the overall image quality. PMID:27669571

  16. High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and Massively Parallel Sequencing

    DTIC Science & Technology

    2010-10-14

    functions of Alphavirus non- structural proteins has been elucidated through molecular and classical genetics studies of two prototypical alphaviruses ...sensitive mutants have been used extensively to elucidate replication and virulence properties of alphaviruses . To demonstrate the utility of our functional... Alphavirus replication , and has helped to identify the activities and interactions of many viral proteins [10,21,37,38,39,40,41,42, 43,44]. We

  17. Parallel processor engine model program

    NASA Technical Reports Server (NTRS)

    Mclaughlin, P.

    1984-01-01

    The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

  18. A study of parallelizing O(N) Green-function-based Monte Carlo method for many fermions coupled with classical degrees of freedom

    NASA Astrophysics Data System (ADS)

    Zhang, Shixun; Yamagia, Shinichi; Yunoki, Seiji

    2013-08-01

    Models of fermions interacting with classical degrees of freedom are applied to a large variety of systems in condensed matter physics. For this class of models, Weiße [Phys. Rev. Lett. 102, 150604 (2009)] has recently proposed a very efficient numerical method, called O(N) Green-Function-Based Monte Carlo (GFMC) method, where a kernel polynomial expansion technique is used to avoid the full numerical diagonalization of the fermion Hamiltonian matrix of size N, which usually costs O(N3) computational complexity. Motivated by this background, in this paper we apply the GFMC method to the double exchange model in three spatial dimensions. We mainly focus on the implementation of GFMC method using both MPI on a CPU-based cluster and Nvidia's Compute Unified Device Architecture (CUDA) programming techniques on a GPU-based (Graphics Processing Unit based) cluster. The time complexity of the algorithm and the parallel implementation details on the clusters are discussed. We also show the performance scaling for increasing Hamiltonian matrix size and increasing number of nodes, respectively. The performance evaluation indicates that for a 323 Hamiltonian a single GPU shows higher performance equivalent to more than 30 CPU cores parallelized using MPI.

  19. Exploration of the functional hierarchy of the basal layer of human epidermis at the single-cell level using parallel clonal microcultures of keratinocytes.

    PubMed

    Fortunel, Nicolas O; Cadio, Emmanuelle; Vaigot, Pierre; Chadli, Loubna; Moratille, Sandra; Bouet, Stéphan; Roméo, Paul-Henri; Martin, Michèle T

    2010-04-01

    The basal layer of human epidermis contains both stem cells and keratinocyte progenitors. Because of this cellular heterogeneity, the development of methods suitable for investigations at a clonal level is dramatically needed. Here, we describe a new method that allows multi-parallel clonal cultures of basal keratinocytes. Immediately after extraction from tissue samples, cells are sorted by flow cytometry based on their high integrin-alpha 6 expression and plated individually in microculture wells. This automated cell deposition process enables large-scale characterization of primary clonogenic capacities. The resulting clonal growth profile provided a precise assessment of basal keratinocyte hierarchy, as the size distribution of 14-day-old clones ranged from abortive to highly proliferative clones containing 1.7 x 10(5) keratinocytes (17.4 cell doublings). Importantly, these 14-day-old primary clones could be used to generate three-dimensional reconstructed epidermis with the progeny of a single cell. In long-term cultures, a fraction of highly proliferative clones could sustain extensive expansion of >100 population doublings over 14 weeks and exhibited long-term epidermis reconstruction potency, thus fulfilling candidate stem cell functional criteria. In summary, parallel clonal microcultures provide a relevant model for single-cell studies on interfollicular keratinocytes, which could be also used in other epithelial models, including hair follicle and cornea. The data obtained using this system support the hierarchical model of basal keratinocyte organization in human interfollicular epidermis.

  20. Drawing a high-resolution functional map of adeno-associated virus capsid by massively parallel sequencing.

    PubMed

    Adachi, Kei; Enoki, Tatsuji; Kawano, Yasuhiro; Veraz, Michael; Nakai, Hiroyuki

    2014-01-01

    Adeno-associated virus (AAV) capsid engineering is an emerging approach to advance gene therapy. However, a systematic analysis on how each capsid amino acid contributes to multiple functions remains challenging. Here we show proof-of-principle and successful application of a novel approach, termed AAV Barcode-Seq, that allows us to characterize phenotypes of hundreds of different AAV strains in a high-throughput manner and therefore overcomes technical difficulties in the systematic analysis. In this approach, we generate DNA barcode-tagged AAV libraries and determine a spectrum of phenotypes of each AAV strain by Illumina barcode sequencing. By applying this method to AAV capsid mutant libraries tagged with DNA barcodes, we can draw a high-resolution map of AAV capsid amino acids important for the structural integrity and functions including receptor binding, tropism, neutralization and blood clearance. Thus, Barcode-Seq provides a new tool to generate a valuable resource for virus and gene therapy research.

  1. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells

    PubMed Central

    Liu, Xiao-Ping; Koehler, Karl R.; Mikosz, Andrew M.; Hashino, Eri; Holt, Jeffrey R.

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair. PMID:27215798

  2. Increased CD8+ T-cell Function following Castration and Immunization Is Countered by Parallel Expansion of Regulatory T Cells

    PubMed Central

    Tang, Shuai; Moore, Miranda L.; Grayson, Jason M.; Dubey, Purnima

    2013-01-01

    Although androgen ablation therapy is effective in treating primary prostate cancers, a significant number of patients develop incurable castration-resistant disease. Recent studies have suggested a potential synergy between vaccination and androgen ablation, yet the enhanced T-cell function is transient. Using a defined tumor antigen model, UV-8101-RE, we found that concomitant castration significantly increased the frequency and function of antigen-specific CD8+ T cells early after the immunization of wild-type mice. However, at a late time point after immunization, effector function was reduced to the same level as noncastrated mice and was accompanied by a concomitant amplification in CD4+CD25+Foxp3+ regulatory T cells (Treg) following immunization. We investigated whether Treg expansion occurred following castration of prostate tumor–bearing mice. In the prostate-specific Pten−/− mouse model of prostate cancer, we observed an accelerated Treg expansion in mice bearing the castration-resistant endogenous prostate tumor, which prevented effector responses to UV-8101-RE. Treg depletion together with castration elicited a strong CD8+ T-cell response to UV-8101-RE in Pten−/− mice and rescued effector function in castrated and immunized wild-type mice. In addition, Treg expansion in Pten−/− mice was prevented by in vivo interleukin (IL)-2 blockade suggesting that increased IL-2 generated by castration and immunization promotes Treg expansion. Our findings therefore suggest that although effector responses are augmented by castration, the concomitant expansion of Tregs is one mechanism responsible for only transient immune potentiation after androgen ablation. PMID:22374980

  3. Collisionless parallel shocks

    SciTech Connect

    Khabibrakhmanov, I.K. ); Galeev, A.A.; Galinsky, V.L. )

    1993-02-01

    A collisionless parallel shock model is presented which is based on solitary-type solutions of the modified derivative nonlinear Schrodinger equation (MDNLS) for parallel Alfven waves. We generalize the standard derivative nonlinear Schrodinger equation in order to include the possible anisotropy of the plasma distribution function and higher-order Korteweg-de Vies type dispersion. Stationary solutions of MDNLS are discussed. The new mechanism, which can be called [open quote]adiabatic[close quote] of ion reflection from the magnetic mirror of the parallel shock structure is the natural and essential feature of the parallel shock that introduces the irreversible properties into the nonlinear wave structure and may significantly contribute to the plasma heating upstream as well as downstream of the shock. The anisotropic nature of [open quotes]adiabatic[close quotes] reflections leads to the asymmetric particle distribution in the upstream as well in the downstream regions of the shock. As a result, nonzero heat flux appears near the front of the shock. It is shown that this causes the stochastic behavior of the nonlinear waves which can significantly contribute to the shock thermalization. The number of adiabaticaly reflected ions define the threshold conditions of the fire-hose and mirror type instabilities in the downstream and upstream regions and thus determine a parameter region in which the described laminar parallel shock structure can exist. The threshold conditions for the fire hose and mirror-type instabilities in the downstream and upstream regions of the shock are defined by the number of reflected particles and thus determine a parameter region in which the described laminar parallel shock structure can exist. 29 refs., 4 figs.

  4. A parallel implementation of the analytic nuclear gradient for time-dependent density functional theory within the Tamm-Dancoff approximation

    NASA Astrophysics Data System (ADS)

    Liu, Fenglai; Gan, Zhengting; Shao, Yihan; Hsu, Chao-Ping; Dreuw, Andreas; Head-Gordon, Martin; Miller, Benjamin T.; Brooks, Bernard R.; Yu, Jian-Guo; Furlani, Thomas R.; Kong, Jing

    2010-10-01

    We derived the analytic gradient for the excitation energies from a time-dependent density functional theory calculation within the Tamm-Dancoff approximation (TDDFT/TDA) using Gaussian atomic orbital basis sets, and introduced an efficient serial and parallel implementation. Some timing results are shown from a B3LYP/6-31G**/SG-1-grid calculation on zincporphyrin. We also performed TDDFT/TDA geometry optimizations for low-lying excited states of 20 small molecules, and compared adiabatic excitation energies and optimized geometry parameters to experimental values using the B3LYP and ωB97 functionals. There are only minor differences between TDDFT and TDA optimized excited state geometries and adiabatic excitation energies. Optimized bond lengths are in better agreement with experiment for both functionals than either CC2 or SOS-CIS(D0), while adiabatic excitation energies are in similar or slightly poorer agreement. Optimized bond angles with both functionals are more accurate than CIS values, but less accurate than either CC2 or SOS-CIS(D0) ones.

  5. Resonance line transfer calculations by doubling thin layers. I - Comparison with other techniques. II - The use of the R-parallel redistribution function. [planetary atmospheres

    NASA Technical Reports Server (NTRS)

    Yelle, Roger V.; Wallace, Lloyd

    1989-01-01

    A versatile and efficient technique for the solution of the resonance line scattering problem with frequency redistribution in planetary atmospheres is introduced. Similar to the doubling approach commonly used in monochromatic scattering problems, the technique has been extended to include the frequency dependence of the radiation field. Methods for solving problems with external or internal sources and coupled spectral lines are presented, along with comparison of some sample calculations with results from Monte Carlo and Feautrier techniques. The doubling technique has also been applied to the solution of resonance line scattering problems where the R-parallel redistribution function is appropriate, both neglecting and including polarization as developed by Yelle and Wallace (1989). With the constraint that the atmosphere is illuminated from the zenith, the only difficulty of consequence is that of performing precise frequency integrations over the line profiles. With that problem solved, it is no longer necessary to use the Monte Carlo method to solve this class of problem.

  6. An intangible energy in the functioning biosystem. II: Useful parallels with circuit theory and with non-linear optics.

    PubMed

    Reid, B L

    1995-06-01

    The argument is developed that a structure and function already exists in selected inanimate systems for an intangible energy dissipating these systems and that, in so doing, this energy exhibits certain properties, readily recognised in the functioning biosystem. The central thesis is that, during dissipation, the structure of the biosystem affords opportunity for an enhanced display of these properties, so that this structure can be rationally recognised as obligatory in the transition, inanimate to animate matter. The systems chosen are those of reactance in linear circuit theory of electronics, and some recent developments in non-linear optics, both of which rely on imaginary or quantal force to display observable effects. Discussion occurs on the fashion which the development of a statistical formalism as a basis for the study of squeezed states of light in these non-linear systems, has, at the same time, overcome a long standing veto on the practical use of quantal energy associated with the Uncertainty Principle of Heisenberg. These ideas are used to vindicate the suggestion that a theoretical basis is presently available for an engineering type approach, toward an intangible force as it exists in the biosystem. The origins and properties of such a force continue to be considered by many as immersed in mysticism.

  7. Fiber-type distribution in insect leg muscles parallels similarities and differences in the functional role of insect walking legs.

    PubMed

    Godlewska-Hammel, Elzbieta; Büschges, Ansgar; Gruhn, Matthias

    2017-06-08

    Previous studies have demonstrated that myofibrillar ATPase (mATPase) enzyme activity in muscle fibers determines their contraction properties. We analyzed mATPase activities in muscles of the front, middle and hind legs of the orthopteran stick insect (Carausius morosus) to test the hypothesis that differences in muscle fiber types and distributions reflected differences in their behavioral functions. Our data show that all muscles are composed of at least three fiber types, fast, intermediate and slow, and demonstrate that: (1) in the femoral muscles (extensor and flexor tibiae) of all legs, the number of fast fibers decreases from proximal to distal, with a concomitant increase in the number of slow fibers. (2) The swing phase muscles protractor coxae and levator trochanteris, have smaller percentages of slow fibers compared to the antagonist stance muscles retractor coxae and depressor trochanteris. (3) The percentage of slow fibers in the retractor coxae and depressor trochanteris increases significantly from front to hind legs. These results suggest that fiber-type distribution in leg muscles of insects is not identical across leg muscles but tuned towards the specific function of a given muscle in the locomotor system.

  8. Processes setting the structure of the electron distribution function within the exhausts of anti-parallel reconnection

    NASA Astrophysics Data System (ADS)

    Egedal, J.; Wetherton, B.; Daughton, W.; Le, A.

    2016-12-01

    In situ spacecraft observations within the exhausts of magnetic reconnection document a large variation in the velocity space structure of the electron distribution function. Multiple mechanisms help govern the underlying electron dynamics, yielding a range of signatures for collisionless reconnection. These signatures include passing beams of electrons separated by well-defined boundaries from betatron heated/cooled trapped electrons. The present study emphasizes how localized regions of non-adiabatic electron dynamics can mix electrons across the trapped/passing boundaries and impact the form of the electron distributions in the full width of the exhaust. While our study is based on 2D simulations, the described principles shaping the velocity space distributions also apply to 3D geometries making our findings relevant to spacecraft observation of reconnection in the Earth's magnetosphere.

  9. Parallel Computing in Optimization.

    DTIC Science & Technology

    1984-10-01

    include : Heller [1978] and Sameh [1977] (surveys of algorithms), Duff [1983], Fong and Jordan [1977]. Jordan [1979]. and Rodrigue [1982] (all mainly...constrained concave function by partition of feasible domain", Mathematics of Operations Research 8, pp. A. Sameh [1977, "Numerical parallel algorithms...a survey", in High Speed Computer and Algorithm Organization, D. Kuck, D. Lawrie, and A. Sameh , eds., Academic Press, pp. 207-228. 1,. J. Siegel

  10. Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction.

    PubMed

    Teramitsu, Ikuko; Kudo, Lili C; London, Sarah E; Geschwind, Daniel H; White, Stephanie A

    2004-03-31

    Humans and songbirds are two of the rare animal groups that modify their innate vocalizations. The identification of FOXP2 as the monogenetic locus of a human speech disorder exhibited by members of the family referred to as KE enables the first examination of whether molecular mechanisms for vocal learning are shared between humans and songbirds. Here, in situ hybridization analyses for FoxP1 and FoxP2 in a songbird reveal a corticostriatal expression pattern congruent with the abnormalities in brain structures of affected KE family members. The overlap in FoxP1 and FoxP2 expression observed in the songbird suggests that combinatorial regulation by these molecules during neural development and within vocal control structures may occur. In support of this idea, we find that FOXP1 and FOXP2 expression patterns in human fetal brain are strikingly similar to those in the songbird, including localization to subcortical structures that function in sensorimotor integration and the control of skilled, coordinated movement. The specific colocalization of FoxP1 and FoxP2 found in several structures in the bird and human brain predicts that mutations in FOXP1 could also be related to speech disorders.

  11. Parallel assessment of male reproductive function in workers and wild rats exposed to pesticides in banana plantations in Guadeloupe.

    PubMed

    Multigner, Luc; Kadhel, Philippe; Pascal, Michel; Huc-Terki, Farida; Kercret, Henri; Massart, Catherine; Janky, Eustase; Auger, Jacques; Jégou, Bernard

    2008-07-30

    There is increasing evidence that reproductive abnormalities are increasing in frequency in both human population and among wild fauna. This increase is probably related to exposure to toxic contaminants in the environment. The use of sentinel species to raise alarms relating to human reproductive health has been strongly recommended. However, no simultaneous studies at the same site have been carried out in recent decades to evaluate the utility of wild animals for monitoring human reproductive disorders. We carried out a joint study in Guadeloupe assessing the reproductive function of workers exposed to pesticides in banana plantations and of male wild rats living in these plantations. A cross-sectional study was performed to assess semen quality and reproductive hormones in banana workers and in men working in non-agricultural sectors. These reproductive parameters were also assessed in wild rats captured in the plantations and were compared with those in rats from areas not directly polluted by humans. No significant difference in sperm characteristics and/or hormones was found between workers exposed and not exposed to pesticide. By contrast, rats captured in the banana plantations had lower testosterone levels and gonadosomatic indices than control rats. Wild rats seem to be more sensitive than humans to the effects of pesticide exposure on reproductive health. We conclude that the concept of sentinel species must be carefully validated as the actual nature of exposure may varies between human and wild species as well as the vulnerable time period of exposure and various ecological factors.

  12. Massively parallel patterning of complex 2D and 3D functional polymer brushes by polymer pen lithography.

    PubMed

    Xie, Zhuang; Chen, Chaojian; Zhou, Xuechang; Gao, Tingting; Liu, Danqing; Miao, Qian; Zheng, Zijian

    2014-08-13

    We report the first demonstration of centimeter-area serial patterning of complex 2D and 3D functional polymer brushes by high-throughput polymer pen lithography. Arbitrary 2D and 3D structures of poly(glycidyl methacrylate) (PGMA) brushes are fabricated over areas as large as 2 cm × 1 cm, with a remarkable throughput being 3 orders of magnitudes higher than the state-of-the-arts. Patterned PGMA brushes are further employed as resist for fabricating Au micro/nanostructures and hard molds for the subsequent replica molding of soft stamps. On the other hand, these 2D and 3D PGMA brushes are also utilized as robust and versatile platforms for the immobilization of bioactive molecules to form 2D and 3D patterned DNA oligonucleotide and protein chips. Therefore, this low-cost, yet high-throughput "bench-top" serial fabrication method can be readily applied to a wide range of fields including micro/nanofabrication, optics and electronics, smart surfaces, and biorelated studies.

  13. NOCA-1 functions with γ-tubulin and in parallel to Patronin to assemble non-centrosomal microtubule arrays in C. elegans

    PubMed Central

    Wang, Shaohe; Wu, Di; Quintin, Sophie; Green, Rebecca A; Cheerambathur, Dhanya K; Ochoa, Stacy D; Desai, Arshad; Oegema, Karen

    2015-01-01

    Non-centrosomal microtubule arrays assemble in differentiated tissues to perform mechanical and transport-based functions. In this study, we identify Caenorhabditis elegans NOCA-1 as a protein with homology to vertebrate ninein. NOCA-1 contributes to the assembly of non-centrosomal microtubule arrays in multiple tissues. In the larval epidermis, NOCA-1 functions redundantly with the minus end protection factor Patronin/PTRN-1 to assemble a circumferential microtubule array essential for worm growth and morphogenesis. Controlled degradation of a γ-tubulin complex subunit in this tissue revealed that γ-tubulin acts with NOCA-1 in parallel to Patronin/PTRN-1. In the germline, NOCA-1 and γ-tubulin co-localize at the cell surface, and inhibiting either leads to a microtubule assembly defect. γ-tubulin targets independently of NOCA-1, but NOCA-1 targeting requires γ-tubulin when a non-essential putatively palmitoylated cysteine is mutated. These results show that NOCA-1 acts with γ-tubulin to assemble non-centrosomal arrays in multiple tissues and highlight functional overlap between the ninein and Patronin protein families. DOI: http://dx.doi.org/10.7554/eLife.08649.001 PMID:26371552

  14. Parallel assessment of male reproductive function in workers and wild rats exposed to pesticides in banana plantations in Guadeloupe

    PubMed Central

    Multigner, Luc; Kadhel, Philippe; Pascal, Michel; Huc-Terki, Farida; Kercret, Henri; Massart, Catherine; Janky, Eustase; Auger, Jacques; Jégou, Bernard

    2008-01-01

    Background There is increasing evidence that reproductive abnormalities are increasing in frequency in both human population and among wild fauna. This increase is probably related to exposure to toxic contaminants in the environment. The use of sentinel species to raise alarms relating to human reproductive health has been strongly recommended. However, no simultaneous studies at the same site have been carried out in recent decades to evaluate the utility of wild animals for monitoring human reproductive disorders. We carried out a joint study in Guadeloupe assessing the reproductive function of workers exposed to pesticides in banana plantations and of male wild rats living in these plantations. Methods A cross-sectional study was performed to assess semen quality and reproductive hormones in banana workers and in men working in non-agricultural sectors. These reproductive parameters were also assessed in wild rats captured in the plantations and were compared with those in rats from areas not directly polluted by humans. Results No significant difference in sperm characteristics and/or hormones was found between workers exposed and not exposed to pesticide. By contrast, rats captured in the banana plantations had lower testosterone levels and gonadosomatic indices than control rats. Conclusion Wild rats seem to be more sensitive than humans to the effects of pesticide exposure on reproductive health. We conclude that the concept of sentinel species must be carefully validated as the actual nature of exposure may varies between human and wild species as well as the vulnerable time period of exposure and various ecological factors. PMID:18667078

  15. Parallel pivoting combined with parallel reduction

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita

    1987-01-01

    Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.

  16. Parallel Programming in the Age of Ubiquitous Parallelism

    NASA Astrophysics Data System (ADS)

    Pingali, Keshav

    2014-04-01

    Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs

  17. Ultrascalable petaflop parallel supercomputer

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  18. Special parallel processing workshop

    SciTech Connect

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  19. The electron signature of parallel electric fields

    NASA Astrophysics Data System (ADS)

    Burch, J. L.; Gurgiolo, C.; Menietti, J. D.

    1990-12-01

    Dynamics Explorer I High-Altitude Plasma Instrument electron data are presented. The electron distribution functions have characteristics expected of a region of parallel electric fields. The data are consistent with previous test-particle simulations for observations within parallel electric field regions which indicate that typical hole, bump, and loss-cone electron distributions, which contain evidence for parallel potential differences both above and below the point of observation, are not expected to occur in regions containing actual parallel electric fields.

  20. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    NASA Astrophysics Data System (ADS)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering

  1. Development and study of a parallel algorithm of iteratively forming latent functionally-determined structures for classification and analysis of meteorological data

    NASA Astrophysics Data System (ADS)

    Sorokin, V. A.; Volkov, Yu V.; Sherstneva, A. I.; Botygin, I. A.

    2016-11-01

    This paper overviews a method of generating climate regions based on an analytic signal theory. When applied to atmospheric surface layer temperature data sets, the method allows forming climatic structures with the corresponding changes in the temperature to make conclusions on the uniformity of climate in an area and to trace the climate changes in time by analyzing the type group shifts. The algorithm is based on the fact that the frequency spectrum of the thermal oscillation process is narrow-banded and has only one mode for most weather stations. This allows using the analytic signal theory, causality conditions and introducing an oscillation phase. The annual component of the phase, being a linear function, was removed by the least squares method. The remaining phase fluctuations allow consistent studying of their coordinated behavior and timing, using the Pearson correlation coefficient for dependence evaluation. This study includes program experiments to evaluate the calculation efficiency in the phase grouping task. The paper also overviews some single-threaded and multi-threaded computing models. It is shown that the phase grouping algorithm for meteorological data can be parallelized and that a multi-threaded implementation leads to a 25-30% increase in the performance.

  2. Executive functioning as a mediator of conduct problems prevention in children of homeless families residing in temporary supportive housing: a parallel process latent growth modeling approach.

    PubMed

    Piehler, Timothy F; Bloomquist, Michael L; August, Gerald J; Gewirtz, Abigail H; Lee, Susanne S; Lee, Wendy S C

    2014-01-01

    A culturally diverse sample of formerly homeless youth (ages 6-12) and their families (n = 223) participated in a cluster randomized controlled trial of the Early Risers conduct problems prevention program in a supportive housing setting. Parents provided 4 annual behaviorally-based ratings of executive functioning (EF) and conduct problems, including at baseline, over 2 years of intervention programming, and at a 1-year follow-up assessment. Using intent-to-treat analyses, a multilevel latent growth model revealed that the intervention group demonstrated reduced growth in conduct problems over the 4 assessment points. In order to examine mediation, a multilevel parallel process latent growth model was used to simultaneously model growth in EF and growth in conduct problems along with intervention status as a covariate. A significant mediational process emerged, with participation in the intervention promoting growth in EF, which predicted negative growth in conduct problems. The model was consistent with changes in EF fully mediating intervention-related changes in youth conduct problems over the course of the study. These findings highlight the critical role that EF plays in behavioral change and lends further support to its importance as a target in preventive interventions with populations at risk for conduct problems.

  3. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  4. Massively Parallel Genetics.

    PubMed

    Shendure, Jay; Fields, Stanley

    2016-06-01

    Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance." Copyright © 2016 by the Genetics Society of America.

  5. Matpar: Parallel Extensions for MATLAB

    NASA Technical Reports Server (NTRS)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  6. Matpar: Parallel Extensions for MATLAB

    NASA Technical Reports Server (NTRS)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  7. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  8. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  9. Distinguishing serial and parallel parsing.

    PubMed

    Gibson, E; Pearlmutter, N J

    2000-03-01

    This paper discusses ways of determining whether the human parser is serial maintaining at most, one structural interpretation at each parse state, or whether it is parallel, maintaining more than one structural interpretation in at least some circumstances. We make four points. The first two counterclaims made by Lewis (2000): (1) that the availability of alternative structures should not vary as a function of the disambiguating material in some ranked parallel models; and (2) that parallel models predict a slow down during the ambiguous region for more syntactically ambiguous structures. Our other points concern potential methods for seeking experimental evidence relevant to the serial/parallel question. We discuss effects of the plausibility of a secondary structure in the ambiguous region (Pearlmutter & Mendelsohn, 1999) and suggest examining the distribution of reaction times in the disambiguating region.

  10. A parallel group double-blind RCT of vitamin D3 assessing physical function: is the biochemical response to treatment affected by overweight and obesity?

    PubMed

    Wood, A D; Secombes, K R; Thies, F; Aucott, L S; Black, A J; Reid, D M; Mavroeidi, A; Simpson, W G; Fraser, W D; Macdonald, H M

    2014-01-01

    Vitamin D may affect skeletal muscle function. In a double-blind, randomised, placebo-controlled trial, we found that vitamin D3 supplementation (400 or 1,000 I.U. vs. placebo daily for 1 year with bimonthly study visits) does not improve grip strength or reduce falls. This study aimed to test the supplementation effects of vitamin D3 on physical function and examine associations between overweight/obesity and the biochemical response to treatment. In a parallel group double-blind RCT, healthy postmenopausal women from North East Scotland (latitude-57° N) aged 60-70 years (body mass index (BMI), 18-45 kg/m(2)) were assigned (computer randomisation) to daily vitamin D3 (400 I.U. (n = 102)/1,000 I.U. (n = 101)) or matching placebo (n = 102) (97, 96 and 100 participants analysed for outcomes, respectively) from identical coded containers for 1 year. Grip strength (primary outcome), falls, diet, physical activity and ultraviolet B radiation exposure were measured bimonthly, as were serum 25(OH)D, adjusted calcium (ACa) and phosphate. Fat/lean mass (dual energy X-ray absorptiometry), anthropometry, 1,25-dihydroxyvitamin D and parathyroid hormone were measured at baseline and 12 months. Participants and researchers were blinded throughout intervention and analysis. Treatment had no effect on grip strength (mean change (SD)/year = -0.5 (2.5), -0.9 (2.7) and -0.4 (3.3) kg force for 400/1,000 I.U. vitamin D3 and placebo groups, respectively (P = .10, ANOVA)) or falls (P = .65, chi-squared test). Biochemical responses were similar across BMI categories (<25.25-29.99, ≥30 kg/m(2)) with the exception of a small change at 12-months in serum ACa in overweight compared to non-overweight participants (P = .01, ANOVA; 1,000 I.U. group). In the placebo group, 25(OH)D peak concentration change (winter to summer) was negatively associated with weight (r = -.268), BMI (r = -.198), total (r = -.278) and trunk fat mass (r = -.251), with total and trunk fat mass predictive of winter to

  11. Statistics of a parallel Poynting vector in the auroral zone as a function of altitude using Polar EFI and MFE data and Astrid-2 EMMA data

    NASA Astrophysics Data System (ADS)

    Janhunen, P.; Olsson, A.; Tsyganenko, N. A.; Russell, C. T.; Laakso, H.; Blomberg, L. G.

    2005-07-01

    We study the wave-related (AC) and static (DC) parallel Poynting vector (Poynting energy flux) as a function of altitude in auroral field lines using Polar EFI and MFE data. The study is statistical and contains 5 years of data in the altitude range 5000 30000 km. We verify the low altitude part of the results by comparison with earlier Astrid-2 EMMA Poynting vector statistics at 1000 km altitude. The EMMA data are also used to statistically compensate the Polar results for the missing zonal electric field component. We compare the Poynting vector with previous statistical DMSP satellite data concerning the electron precipitation power. We find that the AC Poynting vector (Alfvén-wave related Poynting vector) is statistically not sufficient to power auroral electron precipitation, although it may, for Kp>2, power 25 50% of it. The statistical AC Poynting vector also has a stepwise transition at R=4 RE, so that its amplitude increases with increasing altitude. We suggest that this corresponds to Alfvén waves being in Landau resonance with electrons, so that wave-induced electron acceleration takes place at this altitude range, which was earlier named the Alfvén Resonosphere (ARS). The DC Poynting vector is ~3 times larger than electron precipitation and corresponds mainly to ionospheric Joule heating. In the morning sector (02:00 06:00 MLT) we find that the DC Poynting vector has a nontrivial altitude profile such that it decreases by a factor of ~2 when moving upward from 3 to 4 RE radial distance. In other nightside MLT sectors the altitude profile is more uniform. The morning sector nontrivial altitude profile may be due to divergence of the perpendicular Poynting vector field at R=3 4 RE. Keywords. Magnetospheric physics (Auroral phenomena; Magnetosphere-ionosphere interactions) Space plasma physics (Wave-particle interactions)

  12. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  13. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  14. Parallel processing ITS

    SciTech Connect

    Fan, W.C.; Halbleib, J.A. Sr.

    1996-09-01

    This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.

  15. Introduction to parallel programming

    SciTech Connect

    Brawer, S. )

    1989-01-01

    This book describes parallel programming and all the basic concepts illustrated by examples in a simplified FORTRAN. Concepts covered include: The parallel programming model; The creation of multiple processes; Memory sharing; Scheduling; Data dependencies. In addition, a number of parallelized applications are presented, including a discrete-time, discrete-event simulator, numerical integration, Gaussian elimination, and parallelized versions of the traveling salesman problem and the exploration of a maze.

  16. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  17. Research in parallel computing

    NASA Technical Reports Server (NTRS)

    Ortega, James M.; Henderson, Charles

    1994-01-01

    This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.

  18. Parallel algorithm development

    SciTech Connect

    Adams, T.F.

    1996-06-01

    Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.

  19. Parallel Atomistic Simulations

    SciTech Connect

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  20. Parallel Implicit Algorithms for CFD

    NASA Technical Reports Server (NTRS)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  1. Multi-slice parallel transmission three-dimensional tailored RF (PTX 3DTRF) pulse design for signal recovery in ultra high field functional MRI

    NASA Astrophysics Data System (ADS)

    Zheng, Hai; Zhao, Tiejun; Qian, Yongxian; Schirda, Claudiu; Ibrahim, Tamer S.; Boada, Fernando E.

    2013-03-01

    T2∗ weighted fMRI at high and ultra high field (UHF) is often hampered by susceptibility-induced, through-plane, signal loss. Three-dimensional tailored RF (3DTRF) pulses have been shown to be an effective approach for mitigating through-plane signal loss at UHF. However, the required RF pulse lengths are too long for practical applications. Recently, parallel transmission (PTX) has emerged as a very effective means for shortening the RF pulse duration for 3DTRF without sacrificing the excitation performance. In this article, we demonstrate a RF pulse design strategy for 3DTRF based on the use of multi-slice PTX 3DTRF to simultaneously and precisely recover signal with whole-brain coverage. Phantom and human experiments are used to demonstrate the effectiveness and robustness of the proposed method on three subjects using an eight-channel whole body parallel transmission system.

  2. Parallel digital forensics infrastructure.

    SciTech Connect

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  3. Linearly exact parallel closures for slab geometry

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-01

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).

  4. Introduction to Parallel Computing

    DTIC Science & Technology

    1992-05-01

    Topology C, Ada, C++, Data-parallel FORTRAN, 2D mesh of node boards, each node FORTRAN-90 (late 1992) board has 1 application processor Devopment Tools ...parallel machines become the wave of the present, tools are increasingly needed to assist programmers in creating parallel tasks and coordinating...their activities. Linda was designed to be such a tool . Linda was designed with three important goals in mind: to be portable, efficient, and easy to use

  5. Parallel Wolff Cluster Algorithms

    NASA Astrophysics Data System (ADS)

    Bae, S.; Ko, S. H.; Coddington, P. D.

    The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.

  6. Application Portable Parallel Library

    NASA Technical Reports Server (NTRS)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  7. Application Portable Parallel Library

    NASA Technical Reports Server (NTRS)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  8. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  9. Parallel Algorithms and Patterns

    SciTech Connect

    Robey, Robert W.

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  10. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  11. Social Problems and Deviance: Some Parallel Issues

    ERIC Educational Resources Information Center

    Kitsuse, John I.; Spector, Malcolm

    1975-01-01

    Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)

  12. Parallel and Distributed Computing.

    DTIC Science & Technology

    1986-12-12

    program was devoted to parallel and distributed computing . Support for this part of the program was obtained from the present Army contract and a...Umesh Vazirani. A workshop on parallel and distributed computing was held from May 19 to May 23, 1986 and drew 141 participants. Keywords: Mathematical programming; Protocols; Randomized algorithms. (Author)

  13. Parallel Lisp simulator

    SciTech Connect

    Weening, J.S.

    1988-05-01

    CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.

  14. Parallels in History.

    ERIC Educational Resources Information Center

    Mugleston, William F.

    2000-01-01

    Believes that by focusing on the recurrent situations and problems, or parallels, throughout history, students will understand the relevance of history to their own times and lives. Provides suggestions for parallels in history that may be introduced within lectures or as a means to class discussions. (CMK)

  15. Parallel computing works

    SciTech Connect

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  16. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  17. Totally parallel multilevel algorithms

    NASA Technical Reports Server (NTRS)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  18. Genetic algorithms using SISAL parallel programming language

    SciTech Connect

    Tejada, S.

    1994-05-06

    Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.

  19. Trajectories in parallel optics.

    PubMed

    Klapp, Iftach; Sochen, Nir; Mendlovic, David

    2011-10-01

    In our previous work we showed the ability to improve the optical system's matrix condition by optical design, thereby improving its robustness to noise. It was shown that by using singular value decomposition, a target point-spread function (PSF) matrix can be defined for an auxiliary optical system, which works parallel to the original system to achieve such an improvement. In this paper, after briefly introducing the all optics implementation of the auxiliary system, we show a method to decompose the target PSF matrix. This is done through a series of shifted responses of auxiliary optics (named trajectories), where a complicated hardware filter is replaced by postprocessing. This process manipulates the pixel confined PSF response of simple auxiliary optics, which in turn creates an auxiliary system with the required PSF matrix. This method is simulated on two space variant systems and reduces their system condition number from 18,598 to 197 and from 87,640 to 5.75, respectively. We perform a study of the latter result and show significant improvement in image restoration performance, in comparison to a system without auxiliary optics and to other previously suggested hybrid solutions. Image restoration results show that in a range of low signal-to-noise ratio values, the trajectories method gives a significant advantage over alternative approaches. A third space invariant study case is explored only briefly, and we present a significant improvement in the matrix condition number from 1.9160e+013 to 34,526.

  20. Improving the spatial accuracy in functional magnetic resonance imaging (fMRI) based on the blood oxygenation level dependent (BOLD) effect: benefits from parallel imaging and a 32-channel head array coil at 1.5 Tesla.

    PubMed

    Fellner, C; Doenitz, C; Finkenzeller, T; Jung, E M; Rennert, J; Schlaier, J

    2009-01-01

    Geometric distortions and low spatial resolution are current limitations in functional magnetic resonance imaging (fMRI). The aim of this study was to evaluate if application of parallel imaging or significant reduction of voxel size in combination with a new 32-channel head array coil can reduce those drawbacks at 1.5 T for a simple hand motor task. Therefore, maximum t-values (tmax) in different regions of activation, time-dependent signal-to-noise ratios (SNR(t)) as well as distortions within the precentral gyrus were evaluated. Comparing fMRI with and without parallel imaging in 17 healthy subjects revealed significantly reduced geometric distortions in anterior-posterior direction. Using parallel imaging, tmax only showed a mild reduction (7-11%) although SNR(t) was significantly diminished (25%). In 7 healthy subjects high-resolution (2 x 2 x 2 mm3) fMRI was compared with standard fMRI (3 x 3 x 3 mm3) in a 32-channel coil and with high-resolution fMRI in a 12-channel coil. The new coil yielded a clear improvement for tmax (21-32%) and SNR(t) (51%) in comparison with the 12-channel coil. Geometric distortions were smaller due to the smaller voxel size. Therefore, the reduction in tmax (8-16%) and SNR(t) (52%) in the high-resolution experiment seems to be tolerable with this coil. In conclusion, parallel imaging is an alternative to reduce geometric distortions in fMRI at 1.5 T. Using a 32-channel coil, reduction of the voxel size might be the preferable way to improve spatial accuracy.

  1. Parallel Computational Protein Design

    PubMed Central

    Zhou, Yichao; Donald, Bruce R.; Zeng, Jianyang

    2016-01-01

    Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab [1] to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE [2] and DEEPer [3] to also consider continuous backbone and side-chain flexibility. PMID:27914056

  2. Bilingual parallel programming

    SciTech Connect

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.

  3. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  4. PALM: a Parallel Dynamic Coupler

    NASA Astrophysics Data System (ADS)

    Thevenin, A.; Morel, T.

    2008-12-01

    In order to efficiently represent complex systems, numerical modeling has to rely on many physical models at a time: an ocean model coupled with an atmospheric model is at the basis of climate modeling. The continuity of the solution is granted only if these models can constantly exchange information. PALM is a coupler allowing the concurrent execution and the intercommunication of programs not having been especially designed for that. With PALM, the dynamic coupling approach is introduced: a coupled component can be launched and can release computers' resources upon termination at any moment during the simulation. In order to exploit as much as possible computers' possibilities, the PALM coupler handles two levels of parallelism. The first level concerns the components themselves. While managing the resources, PALM allocates the number of processes which are necessary to any coupled component. These models can be parallel programs based on domain decomposition with MPI or applications multithreaded with OpenMP. The second level of parallelism is a task parallelism: one can define a coupling algorithm allowing two or more programs to be executed in parallel. PALM applications are implemented via a Graphical User Interface called PrePALM. In this GUI, the programmer initially defines the coupling algorithm then he describes the actual communications between the models. PALM offers a very high flexibility for testing different coupling techniques and for reaching the best load balance in a high performance computer. The transformation of computational independent code is almost straightforward. The other qualities of PALM are its easy set-up, its flexibility, its performances, the simple updates and evolutions of the coupled application and the many side services and functions that it offers.

  5. Parallel implicit Monte Carlo in C++

    SciTech Connect

    Urbatsch, T.J.; Evans, T.M.

    1998-12-31

    The authors are developing a parallel C++ Implicit Monte Carlo code in the Draco framework. As a background and motivation for the parallelization strategy, they first present three basic parallelization schemes. They use three hypothetical examples, mimicking the memory constraints of the real world, to examine characteristics of the basic schemes. Next, they present a two-step scheme proposed by Lawrence Livermore National Laboratory (LLNL). The two-step parallelization scheme they develop is based upon LLNL`s two-step scheme. The two-step scheme appears to have greater potential compared to the basic schemes and LLNL`s two-step scheme. Lastly, they explain the code design and describe how the functionality of C++ and the Draco framework assist the development of a parallel code.

  6. NESL: A nested data-parallel language (version 2. 6)

    SciTech Connect

    Blelloch, G.E.

    1993-04-01

    This report describes NESL, a strongly-typed, applicative, data-parallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of data-parallel constructs based on sequences (ordered sets), including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. NESL fully supports nested sequences and nested parallelism -the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with complex and dynamically changing data structures, such as required in many graph and sparse matrix algorithms. NESL also provides a mechanism for calculating the asymptotic running time for a program on various parallel machine models, including the parallel random access machine (PRAM). This is useful for estimating running times of algorithms on actual machines and, when teaching algorithms, for supplying a close correspondence between the code and the theoretical complexity.

  7. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  8. Parallels with nature

    NASA Astrophysics Data System (ADS)

    2014-10-01

    Adam Nelson and Stuart Warriner, from the University of Leeds, talk with Nature Chemistry about their work to develop viable synthetic strategies for preparing new chemical structures in parallel with the identification of desirable biological activity.

  9. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  10. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-09-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, a set of tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory at info.mcs.anl.gov.

  11. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  12. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  13. Revisiting and parallelizing SHAKE

    NASA Astrophysics Data System (ADS)

    Weinbach, Yael; Elber, Ron

    2005-10-01

    An algorithm is presented for running SHAKE in parallel. SHAKE is a widely used approach to compute molecular dynamics trajectories with constraints. An essential step in SHAKE is the solution of a sparse linear problem of the type Ax = b, where x is a vector of unknowns. Conjugate gradient minimization (that can be done in parallel) replaces the widely used iteration process that is inherently serial. Numerical examples present good load balancing and are limited only by communication time.

  14. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  15. A Parallel Product-Convolution approach for representing the depth varying Point Spread Functions in 3D widefield microscopy based on principal component analysis.

    PubMed

    Arigovindan, Muthuvel; Shaevitz, Joshua; McGowan, John; Sedat, John W; Agard, David A

    2010-03-29

    We address the problem of computational representation of image formation in 3D widefield fluorescence microscopy with depth varying spherical aberrations. We first represent 3D depth-dependent point spread functions (PSFs) as a weighted sum of basis functions that are obtained by principal component analysis (PCA) of experimental data. This representation is then used to derive an approximating structure that compactly expresses the depth variant response as a sum of few depth invariant convolutions pre-multiplied by a set of 1D depth functions, where the convolving functions are the PCA-derived basis functions. The model offers an efficient and convenient trade-off between complexity and accuracy. For a given number of approximating PSFs, the proposed method results in a much better accuracy than the strata based approximation scheme that is currently used in the literature. In addition to yielding better accuracy, the proposed methods automatically eliminate the noise in the measured PSFs.

  16. Effect of aerobic exercise on peripheral nerve functions of population with diabetic peripheral neuropathy in type 2 diabetes: a single blind, parallel group randomized controlled trial.

    PubMed

    Dixit, Snehil; Maiya, Arun G; Shastry, B A

    2014-01-01

    To evaluate the effect of moderate intensity aerobic exercise (40%-60% of Heart Rate Reserve (HRR)) on diabetic peripheral neuropathy. A parallel-group, randomized controlled trial was carried out in a tertiary health care setting, India. The study comprised of experimental (moderate intensity aerobic exercise and standard care) and control groups (standard care). Population with type 2 diabetes with clinical neuropathy, defined as a minimum score of seven on the Michigan Diabetic Neuropathy Score (MDNS), was randomly assigned to experimental and control groups by computer generated random number tables. RANOVA was used for data analysis (p<0.05 was significant). A total of 87 patients with DPN were evaluated in the study. After randomization there were 47 patients in the control group and 40 patients in the experimental group. A comparison of two groups using RANOVA for anthropometric measures showed an insignificant change at eight weeks. For distal peroneal nerve's conduction velocity there was a significant difference in two groups at eight weeks (p<0.05), Degrees of freedom (Df)=1, 62, F=5.14, and p=0.03. Sural sensory nerve at eight weeks showed a significant difference in two groups for conduction velocity, Df =1, 60, F=10.16, and p=0.00. Significant differences in mean scores of MDNS were also observed in the two groups at eight weeks (p value significant<0.05). Moderate intensity aerobic exercises can play a valuable role to disrupt the normal progression of DPN in type 2 diabetes. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect

    Gray, Genetha Anne.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica L.

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  18. Sublattice parallel replica dynamics

    NASA Astrophysics Data System (ADS)

    Martínez, Enrique; Uberuaga, Blas P.; Voter, Arthur F.

    2014-06-01

    Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998), 10.1103/PhysRevB.57.R13985] by combining it with the synchronous sublattice approach of Shim and Amar [Y. Shim and J. G. Amar, Phys. Rev. B 71, 125432 (2005), 10.1103/PhysRevB.71.125432], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.

  19. Parallel time integration software

    SciTech Connect

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.

  20. Parallel architectures for vision

    SciTech Connect

    Maresca, M. ); Lavin, M.A. ); Li, H. )

    1988-08-01

    Vision computing involves the execution of a large number of operations on large sets of structured data. Sequential computers cannot achieve the speed required by most of the current applications and therefore parallel architectural solutions have to be explored. In this paper the authors examine the options that drive the design of a vision oriented computer, starting with the analysis of the basic vision computation and communication requirements. They briefly review the classical taxonomy for parallel computers, based on the multiplicity of the instruction and data stream, and apply a recently proposed criterion, the degree of autonomy of each processor, to further classify fine-grain SIMD massively parallel computers. They identify three types of processor autonomy, namely operation autonomy, addressing autonomy, and connection autonomy. For each type they give the basic definitions and show some examples. They focus on the concept of connection autonomy, which they believe is a key point in the development of massively parallel architectures for vision. They show two examples of parallel computers featuring different types of connection autonomy - the Connection Machine and the Polymorphic-Torus - and compare their cost and benefit.

  1. Parallelism in integrated fluidic circuits

    NASA Astrophysics Data System (ADS)

    Bousse, Luc J.; Kopf-Sill, Anne R.; Parce, J. W.

    1998-04-01

    Many research groups around the world are working on integrated microfluidics. The goal of these projects is to automate and integrate the handling of liquid samples and reagents for measurement and assay procedures in chemistry and biology. Ultimately, it is hoped that this will lead to a revolution in chemical and biological procedures similar to that caused in electronics by the invention of the integrated circuit. The optimal size scale of channels for liquid flow is determined by basic constraints to be somewhere between 10 and 100 micrometers . In larger channels, mixing by diffusion takes too long; in smaller channels, the number of molecules present is so low it makes detection difficult. At Caliper, we are making fluidic systems in glass chips with channels in this size range, based on electroosmotic flow, and fluorescence detection. One application of this technology is rapid assays for drug screening, such as enzyme assays and binding assays. A further challenge in this area is to perform multiple functions on a chip in parallel, without a large increase in the number of inputs and outputs. A first step in this direction is a fluidic serial-to-parallel converter. Fluidic circuits will be shown with the ability to distribute an incoming serial sample stream to multiple parallel channels.

  2. Parallel Environment for Quantum Computing

    NASA Astrophysics Data System (ADS)

    Tabakin, Frank; Diaz, Bruno Julia

    2009-03-01

    To facilitate numerical study of noise and decoherence in QC algorithms,and of the efficacy of error correction schemes, we have developed a Fortran 90 quantum computer simulator with parallel processing capabilities. It permits rapid evaluation of quantum algorithms for a large number of qubits and for various ``noise'' scenarios. State vectors are distributed over many processors, to employ a large number of qubits. Parallel processing is implemented by the Message-Passing Interface protocol. A description of how to spread the wave function components over many processors, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors will be delineated.Grover's search and Shor's factoring algorithms with noise will be discussed as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to diverse noise effects, corresponding to solving a stochastic Schrodinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its associated entropy. Applications of this powerful tool is made to delineate the stability and correction of QC processes using Hamiltonian based dynamics.

  3. Processing Semblances Induced through Inter-Postsynaptic Functional LINKs, Presumed Biological Parallels of K-Lines Proposed for Building Artificial Intelligence

    PubMed Central

    Vadakkan, Kunjumon I.

    2011-01-01

    The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation – namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky’s K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system. PMID:21845180

  4. The Function of the Glutamate-Nitric Oxide-cGMP Pathway in Brain in Vivo and Learning Ability Decrease in Parallel in Mature Compared with Young Rats

    ERIC Educational Resources Information Center

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-01-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate-nitric oxide-cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to…

  5. Processing Semblances Induced through Inter-Postsynaptic Functional LINKs, Presumed Biological Parallels of K-Lines Proposed for Building Artificial Intelligence.

    PubMed

    Vadakkan, Kunjumon I

    2011-01-01

    The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation - namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky's K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system.

  6. Expanding small-molecule functional metagenomics through parallel screening of broad-host-range cosmid environmental DNA libraries in diverse proteobacteria.

    PubMed

    Craig, Jeffrey W; Chang, Fang-Yuan; Kim, Jeffrey H; Obiajulu, Steven C; Brady, Sean F

    2010-03-01

    The small-molecule biosynthetic diversity encoded within the genomes of uncultured bacteria is an attractive target for the discovery of natural products using functional metagenomics. Phenotypes commonly associated with the production of small molecules, such as antibiosis, altered pigmentation, or altered colony morphology, are easily identified from screens of arrayed metagenomic library clones. However, functional metagenomic screening methods are limited by their intrinsic dependence on a heterologous expression host. Toward the goal of increasing the small-molecule biosynthetic diversity found in functional metagenomic studies, we report the phenotypic screening of broad-host-range environmental DNA libraries in six different proteobacteria: Agrobacterium tumefaciens, Burkholderia graminis, Caulobacter vibrioides, Escherichia coli, Pseudomonas putida, and Ralstonia metallidurans. Clone-specific small molecules found in culture broth extracts from pigmented and antibacterially active clones, as well as the genetic elements responsible for the biosynthesis of these metabolites, are described. The host strains used in this investigation provided access to unique sets of clones showing minimal overlap, thus demonstrating the potential advantage conferred on functional metagenomics through the use of multiple diverse host species.

  7. Expanding Small-Molecule Functional Metagenomics through Parallel Screening of Broad-Host-Range Cosmid Environmental DNA Libraries in Diverse Proteobacteria▿ †

    PubMed Central

    Craig, Jeffrey W.; Chang, Fang-Yuan; Kim, Jeffrey H.; Obiajulu, Steven C.; Brady, Sean F.

    2010-01-01

    The small-molecule biosynthetic diversity encoded within the genomes of uncultured bacteria is an attractive target for the discovery of natural products using functional metagenomics. Phenotypes commonly associated with the production of small molecules, such as antibiosis, altered pigmentation, or altered colony morphology, are easily identified from screens of arrayed metagenomic library clones. However, functional metagenomic screening methods are limited by their intrinsic dependence on a heterologous expression host. Toward the goal of increasing the small-molecule biosynthetic diversity found in functional metagenomic studies, we report the phenotypic screening of broad-host-range environmental DNA libraries in six different proteobacteria: Agrobacterium tumefaciens, Burkholderia graminis, Caulobacter vibrioides, Escherichia coli, Pseudomonas putida, and Ralstonia metallidurans. Clone-specific small molecules found in culture broth extracts from pigmented and antibacterially active clones, as well as the genetic elements responsible for the biosynthesis of these metabolites, are described. The host strains used in this investigation provided access to unique sets of clones showing minimal overlap, thus demonstrating the potential advantage conferred on functional metagenomics through the use of multiple diverse host species. PMID:20081001

  8. Unique roles of glucagon and glucagon-like peptides: Parallels in understanding the functions of adipokinetic hormones in stress responses in insects.

    PubMed

    Bednářová, Andrea; Kodrík, Dalibor; Krishnan, Natraj

    2013-01-01

    Glucagon is conventionally regarded as a hormone, counter regulatory in function to insulin and plays a critical anti-hypoglycemic role by maintaining glucose homeostasis in both animals and humans. Glucagon performs this function by increasing hepatic glucose output to the blood by stimulating glycogenolysis and gluconeogenesis in response to starvation. Additionally it plays a homeostatic role by decreasing glycogenesis and glycolysis in tandem to try and maintain optimal glucose levels. To perform this action, it also increases energy expenditure which is contrary to what one would expect and has actions which are unique and not entirely in agreement with its role in protection from hypoglycemia. Interestingly, glucagon-like peptides (GLP-1 and GLP-2) from the major fragment of proglucagon (in non-mammalian vertebrates, as well as in mammals) may also modulate response to stress in addition to their other physiological actions. These unique modes of action occur in response to psychological, metabolic and other stress situations and mirror the role of adipokinetic hormones (AKHs) in insects which perform a similar function. The findings on the anti-stress roles of glucagon and glucagon-like peptides in mammalian and non-mammalian vertebrates may throw light on the multiple stress responsive mechanisms which operate in a concerted manner under regulation by AKH in insects thus functioning as a stress responsive hormone while also maintaining organismal homeostasis.

  9. Expression of mitochondrial regulatory genes parallels respiratory capacity and contractile function in a rat model of hypoxia-induced right ventricular hypertrophy

    USDA-ARS?s Scientific Manuscript database

    Chronic hypobaric hypoxia (CHH) increases load on the right ventricle (RV) resulting in RV hypertrophy. We hypothesized that CHH elicits distinct responses, i.e., the hypertrophied RV, unlike the left ventricle (LV), displaying enhanced mitochondrial respiratory and contractile function. Wistar rats...

  10. Parallel optical sampler

    DOEpatents

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  11. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.

  12. Parallel programming with Ada

    SciTech Connect

    Kok, J.

    1988-01-01

    To the human programmer the ease of coding distributed computing is highly dependent on the suitability of the employed programming language. But with a particular language it is also important whether the possibilities of one or more parallel architectures can efficiently be addressed by available language constructs. In this paper the possibilities are discussed of the high-level language Ada and in particular of its tasking concept as a descriptional tool for the design and implementation of numerical and other algorithms that allow execution of parts in parallel. Language tools are explained and their use for common applications is shown. Conclusions are drawn about the usefulness of several Ada concepts.

  13. The NAS Parallel Benchmarks

    SciTech Connect

    Bailey, David H.

    2009-11-15

    The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage

  14. Speeding up parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.

  15. CRUNCH_PARALLEL

    SciTech Connect

    Shumaker, Dana E.; Steefel, Carl I.

    2016-06-21

    The code CRUNCH_PARALLEL is a parallel version of the CRUNCH code. CRUNCH code version 2.0 was previously released by LLNL, (UCRL-CODE-200063). Crunch is a general purpose reactive transport code developed by Carl Steefel and Yabusake (Steefel Yabsaki 1996). The code handles non-isothermal transport and reaction in one, two, and three dimensions. The reaction algorithm is generic in form, handling an arbitrary number of aqueous and surface complexation as well as mineral dissolution/precipitation. A standardized database is used containing thermodynamic and kinetic data. The code includes advective, dispersive, and diffusive transport.

  16. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  17. Parallel reduction in expression, but no loss of functional constraint, in two opsin paralogs within cave populations of Gammarus minus (Crustacea: Amphipoda).

    PubMed

    Carlini, David B; Satish, Suma; Fong, Daniel W

    2013-04-23

    Gammarus minus, a freshwater amphipod living in the cave and surface streams in the eastern USA, is a premier candidate for studying the evolution of troglomorphic traits such as pigmentation loss, elongated appendages, and reduced eyes. In G. minus, multiple pairs of genetically related, physically proximate cave and surface populations exist which exhibit a high degree of intraspecific morphological divergence. The morphology, ecology, and genetic structure of these sister populations are well characterized, yet the genetic basis of their morphological divergence remains unknown. We used degenerate PCR primers designed to amplify opsin genes within the subphylum Crustacea and discovered two distinct opsin paralogs (average inter-paralog protein divergence ≈ 20%) in the genome of three independently derived pairs of G. minus cave and surface populations. Both opsin paralogs were found to be related to other crustacean middle wavelength sensitive opsins. Low levels of nucleotide sequence variation (< 1% within populations) were detected in both opsin genes, regardless of habitat, and dN/dS ratios did not indicate a relaxation of functional constraint in the cave populations with reduced or absent eyes. Maximum likelihood analyses using codon-based models also did not detect a relaxation of functional constraint in the cave lineages. We quantified expression level of both opsin genes and found that the expression of both paralogs was significantly reduced in all three cave populations relative to their sister surface populations. The concordantly lowered expression level of both opsin genes in cave populations of G. minus compared to sister surface populations, combined with evidence for persistent purifying selection in the cave populations, is consistent with an unspecified pleiotropic function of opsin proteins. Our results indicate that phototransduction proteins such as opsins may have retained their function in cave-adapted organisms because they may play a

  18. Parallel reduction in expression, but no loss of functional constraint, in two opsin paralogs within cave populations of Gammarus minus (Crustacea: Amphipoda)

    PubMed Central

    2013-01-01

    Background Gammarus minus, a freshwater amphipod living in the cave and surface streams in the eastern USA, is a premier candidate for studying the evolution of troglomorphic traits such as pigmentation loss, elongated appendages, and reduced eyes. In G. minus, multiple pairs of genetically related, physically proximate cave and surface populations exist which exhibit a high degree of intraspecific morphological divergence. The morphology, ecology, and genetic structure of these sister populations are well characterized, yet the genetic basis of their morphological divergence remains unknown. Results We used degenerate PCR primers designed to amplify opsin genes within the subphylum Crustacea and discovered two distinct opsin paralogs (average inter-paralog protein divergence ≈ 20%) in the genome of three independently derived pairs of G. minus cave and surface populations. Both opsin paralogs were found to be related to other crustacean middle wavelength sensitive opsins. Low levels of nucleotide sequence variation (< 1% within populations) were detected in both opsin genes, regardless of habitat, and dN/dS ratios did not indicate a relaxation of functional constraint in the cave populations with reduced or absent eyes. Maximum likelihood analyses using codon-based models also did not detect a relaxation of functional constraint in the cave lineages. We quantified expression level of both opsin genes and found that the expression of both paralogs was significantly reduced in all three cave populations relative to their sister surface populations. Conclusions The concordantly lowered expression level of both opsin genes in cave populations of G. minus compared to sister surface populations, combined with evidence for persistent purifying selection in the cave populations, is consistent with an unspecified pleiotropic function of opsin proteins. Our results indicate that phototransduction proteins such as opsins may have retained their function in cave

  19. Parallel effects of β-adrenoceptor blockade on cardiac function and fatty acid oxidation in the diabetic heart: Confronting the maze

    PubMed Central

    Sharma, Vijay; McNeill, John H

    2011-01-01

    Diabetic cardiomyopathy is a disease process in which diabetes produces a direct and continuous myocardial insult even in the absence of ischemic, hypertensive or valvular disease. The β-blocking agents bisoprolol, carvedilol and metoprolol have been shown in large-scale randomized controlled trials to reduce heart failure mortality. In this review, we summarize the results of our studies investigating the effects of β-blocking agents on cardiac function and metabolism in diabetic heart failure, and the complex inter-related mechanisms involved. Metoprolol inhibits fatty acid oxidation at the mitochondrial level but does not prevent lipotoxicity; its beneficial effects are more likely to be due to pro-survival effects of chronic treatment. These studies have expanded our understanding of the range of effects produced by β-adrenergic blockade and show how interconnected the signaling pathways of function and metabolism are in the heart. Although our initial hypothesis that inhibition of fatty acid oxidation would be a key mechanism of action was disproved, unexpected results led us to some intriguing regulatory mechanisms of cardiac metabolism. The first was upstream stimulatory factor-2-mediated repression of transcriptional master regulator PGC-1α, most likely occurring as a consequence of the improved function; it is unclear whether this effect is unique to β-blockers, although repression of carnitine palmitoyltransferase (CPT)-1 has not been reported with other drugs which improve function. The second was the identification of a range of covalent modifications which can regulate CPT-1 directly, mediated by a signalome at the level of the mitochondria. We also identified an important interaction between β-adrenergic signaling and caveolins, which may be a key mechanism of action of β-adrenergic blockade. Our experience with this labyrinthine signaling web illustrates that initial hypotheses and anticipated directions do not have to be right in order to

  20. Parallel re-modeling of EF-1α function: divergent EF-1α genes co-occur with EFL genes in diverse distantly related eukaryotes

    PubMed Central

    2013-01-01

    Background Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a ‘differential loss’ hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species). Results In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes. Conclusions According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes. PMID:23800323

  1. The Effect of Parallel-hole Collimator Material on Image and Functional Parameters in SPECT Imaging: A SIMIND Monte Carlo Study.

    PubMed

    Azarm, Ahmadreza; Islamian, Jalil Pirayesh; Mahmoudian, Babak; Gharepapagh, Esmaeil

    2015-01-01

    The collimator in single-photon emission computed tomography (SPECT) is a critical component of the imaging system and plays an impressive role in the imaging quality. In this study, the effect of the collimator material on the radioisotopic image and its functional parameters was studied. The simulating medical imaging nuclear detectors (SIMIND) Monte Carlo program was used to simulate a Siemens E.CAM SPECT (Siemens Medical Solutions, Erlangen, Germany) system equipped with a low-energy high-resolution (LEHR) collimator. The simulation and experimental data from the SPECT imaging modality using (99m)Tc were obtained on a point source and Jaszczak phantom. Seventeen high atomic number materials were considered as LEHR collimator materials. In order to determine the effect of the collimator material on the image and functional parameters, the energy resolution, spatial resolution, contrast, and collimator characteristics parameters such as septal penetration and scatter-to-primary ratio were investigated. Energy spectra profiles, full width at half maximums (FWHMs) (mm) of the point spread function (PSF) curves, system sensitivity, and contrast of cold spheres of the Jaszczak phantom for the simulated and experiment systems have acceptability superimposed. The results of FWHM and energy resolution for the 17 collimators showed that the collimator made of 98% lead and 2% antimony could provide the best FWHM and energy resolution, 7.68 mm and 9.87%, respectively. The LEHR collimator with 98% lead and 2% antimony offers the best resolution and contrast when compared to other high atomic number metals and alloys.

  2. The Effect of Parallel-hole Collimator Material on Image and Functional Parameters in SPECT Imaging: A SIMIND Monte Carlo Study

    PubMed Central

    Azarm, Ahmadreza; Islamian, Jalil Pirayesh; Mahmoudian, Babak; Gharepapagh, Esmaeil

    2015-01-01

    The collimator in single-photon emission computed tomography (SPECT) is a critical component of the imaging system and plays an impressive role in the imaging quality. In this study, the effect of the collimator material on the radioisotopic image and its functional parameters was studied. The simulating medical imaging nuclear detectors (SIMIND) Monte Carlo program was used to simulate a Siemens E.CAM SPECT (Siemens Medical Solutions, Erlangen, Germany) system equipped with a low-energy high-resolution (LEHR) collimator. The simulation and experimental data from the SPECT imaging modality using 99mTc were obtained on a point source and Jaszczak phantom. Seventeen high atomic number materials were considered as LEHR collimator materials. In order to determine the effect of the collimator material on the image and functional parameters, the energy resolution, spatial resolution, contrast, and collimator characteristics parameters such as septal penetration and scatter-to-primary ratio were investigated. Energy spectra profiles, full width at half maximums (FWHMs) (mm) of the point spread function (PSF) curves, system sensitivity, and contrast of cold spheres of the Jaszczak phantom for the simulated and experiment systems have acceptability superimposed. The results of FWHM and energy resolution for the 17 collimators showed that the collimator made of 98% lead and 2% antimony could provide the best FWHM and energy resolution, 7.68 mm and 9.87%, respectively. The LEHR collimator with 98% lead and 2% antimony offers the best resolution and contrast when compared to other high atomic number metals and alloys. PMID:26420985

  3. Parallel re-modeling of EF-1α function: divergent EF-1α genes co-occur with EFL genes in diverse distantly related eukaryotes.

    PubMed

    Kamikawa, Ryoma; Brown, Matthew W; Nishimura, Yuki; Sako, Yoshihiko; Heiss, Aaron A; Yubuki, Naoji; Gawryluk, Ryan; Simpson, Alastair G B; Roger, Andrew J; Hashimoto, Tetsuo; Inagaki, Yuji

    2013-06-26

    Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a 'differential loss' hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species). In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes. According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes.

  4. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (Inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  5. Parallel hierarchical radiosity rendering

    SciTech Connect

    Carter, Michael

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  6. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  7. [The parallel saw blade].

    PubMed

    Mühldorfer-Fodor, M; Hohendorff, B; Prommersberger, K-J; van Schoonhoven, J

    2011-04-01

    For shortening osteotomy, two exactly parallel osteotomies are needed to assure a congruent adaption of the shortened bone after segment resection. This is required for regular bone healing. In addition, it is difficult to shorten a bone to a precise distance using an oblique segment resection. A mobile spacer between two saw blades keeps the distance of the blades exactly parallel during an osteotomy cut. The parallel saw blades from Synthes® are designed for 2, 2.5, 3, 4, and 5 mm shortening distances. Two types of blades are available (e.g., for transverse or oblique osteotomies) to assure precise shortening. Preoperatively, the desired type of osteotomy (transverse or oblique) and the shortening distance has to be determined. Then, the appropriate parallel saw blade is chosen, which is compatible to Synthes® Colibri with an oscillating saw attachment. During the osteotomy cut, the spacer should be kept as close to the bone as possible. Excessive force that may deform the blades should be avoided. Before manipulating the bone ends, it is important to determine that the bone is completely dissected by both saw blades to prevent fracturing of the corticalis with bony spurs. The shortening osteotomy is mainly fixated by plate osteosynthesis. For compression of the bone ends, the screws should be placed eccentrically in the plate holes. For an oblique osteotomy, an additional lag screw should be used.

  8. Parallel Coordinate Axes.

    ERIC Educational Resources Information Center

    Friedlander, Alex; And Others

    1982-01-01

    Several methods of numerical mappings other than the usual cartesian coordinate system are considered. Some examples using parallel axes representation, which are seen to lead to aesthetically pleasing or interesting configurations, are presented. Exercises with alternative representations can stimulate pupil imagination and exploration in…

  9. Parallel Dislocation Simulator

    SciTech Connect

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  10. Parallel fast gauss transform

    SciTech Connect

    Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan

    2010-01-01

    We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.

  11. Progress in parallelizing XOOPIC

    NASA Astrophysics Data System (ADS)

    Mardahl, Peter; Verboncoeur, J. P.

    1997-11-01

    XOOPIC (Object Orient Particle in Cell code for X11-based Unix workstations) is presently a serial 2-D 3v particle-in-cell plasma simulation (J.P. Verboncoeur, A.B. Langdon, and N.T. Gladd, ``An object-oriented electromagnetic PIC code.'' Computer Physics Communications 87 (1995) 199-211.). The present effort focuses on using parallel and distributed processing to optimize the simulation for large problems. The benefits include increased capacity for memory intensive problems, and improved performance for processor-intensive problems. The MPI library is used to enable the parallel version to be easily ported to massively parallel, SMP, and distributed computers. The philosophy employed here is to spatially decompose the system into computational regions separated by 'virtual boundaries', objects which contain the local data and algorithms to perform the local field solve and particle communication between regions. This implementation will reduce the changes required in the rest of the program by parallelization. Specific implementation details such as the hiding of communication latency behind local computation will also be discussed.

  12. Parallel hierarchical global illumination

    SciTech Connect

    Snell, Quinn O.

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  13. High performance parallel architectures

    SciTech Connect

    Anderson, R.E. )

    1989-09-01

    In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.

  14. Parallel Multigrid Equation Solver

    SciTech Connect

    Adams, Mark

    2001-09-07

    Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.

  15. Two-Axis Acceleration of Functional Connectivity Magnetic Resonance Imaging by Parallel Excitation of Phase-Tagged Slices and Half k-Space Acceleration

    PubMed Central

    Jesmanowicz, Andrzej; Nencka, Andrew S.; Li, Shi-Jiang

    2011-01-01

    Abstract Whole brain functional connectivity magnetic resonance imaging requires acquisition of a time course of gradient-recalled (GR) volumetric images. A method is developed to accelerate this acquisition using GR echo-planar imaging and radio frequency (RF) slice phase tagging. For N-fold acceleration, a tailored RF pulse excites N slices using a uniform-field transmit coil. This pulse is the Fourier transform of the profile for the N slices with a predetermined RF phase tag on each slice. A multichannel RF receive coil is used for detection. For n slices, there are n/N groups of slices. Signal-averaged reference images are created for each slice within each slice group for each member of the coil array and used to separate overlapping images that are simultaneously received. The time-overhead for collection of reference images is small relative to the acquisition time of a complete volumetric time course. A least-squares singular value decomposition method allows image separation on a pixel-by-pixel basis. Twofold slice acceleration is demonstrated using an eight-channel RF receive coil, with application to resting-state functional magnetic resonance imaging in the human brain. Data from six subjects at 3 T are reported. The method has been extended to half k-space acquisition, which not only provides additional acceleration, but also facilitates slice separation because of increased signal intensity of the central lines of k-space coupled with reduced susceptibility effects. PMID:22432957

  16. The effect of a core exercise program on Cobb angle and back muscle activity in male students with functional scoliosis: a prospective, randomized, parallel-group, comparative study.

    PubMed

    Park, Yun Hee; Park, Young Sook; Lee, Yong Taek; Shin, Hee Suk; Oh, Min-Kyun; Hong, Jiyeon; Lee, Kyoung Yul

    2016-06-01

    To assess the effect of core strengthening exercises on Cobb angle and muscle activity in male college students with functional scoliosis. Static and dynamic back muscle activity were evaluated via surface electromyography (sEMG). A core exercise protocol comprising 18 exercises was performed three times/week for 10 weeks. Patients were randomly allocated to either a home- or community-based exercise programme. Cervical thoracolumbar scans and sEMG were performed after 10 weeks. A total of 87 students underwent cervical thoracolumbar scans. Of these, 53 were abnormal and were randomised between the home-based (n = 25) or community-based (n = 28) groups. After the 10-week exercise programme, Cobb angles were significantly lower and back muscle strength was significantly improved than baseline in both groups, but there were no statistically significant between group differences. A 10-week core strengthening exercise programme decreases Cobb angle and improves back muscle strength in patients with functional scoliosis. © The Author(s) 2016.

  17. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  18. Learning at different satiation levels reveals parallel functions for the cAMP-protein kinase A cascade in formation of long-term memory.

    PubMed

    Friedrich, Anke; Thomas, Ulf; Müller, Uli

    2004-05-05

    Learning and memory formation in intact animals is generally studied under defined parameters, including the control of feeding. We used associative olfactory conditioning of the proboscis extension response in honeybees to address effects of feeding status on processes of learning and memory formation. Comparing groups of animals with different but defined feeding status at the time of conditioning reveals new and characteristic features in memory formation. In animals fed 18 hr earlier, three-trial conditioning induces a stable memory that consists of different phases: a mid-term memory (MTM), translation-dependent early long-term memory (eLTM; 1-2 d), and a transcription-dependent late LTM (lLTM; > or =3 d). Additional feeding of a small amount of sucrose 4 hr before conditioning leads to a loss of all of these memory phases. Interestingly, the basal activity of the cAMP-dependent protein kinase A (PKA), a key player in LTM formation, differs in animals with different satiation levels. Pharmacological rescue of the low basal PKA activity in animals fed 4 hr before conditioning points to a specific function of cAMP-PKA cascade in mediating satiation-dependent memory formation. An increase in PKA activity during conditioning rescues only transcription-dependent lLTM; acquisition, MTM, and eLTM are still impaired. Thus, during conditioning, the cAMP-PKA cascade mediates the induction of the transcription-dependent lLTM, depending on the satiation level. This result provides the first evidence for a central and distinct function of the cAMP-PKA cascade connecting satiation level with learning.

  19. The impact of oat (Avena sativa) consumption on biomarkers of renal function in patients with chronic kidney disease: A parallel randomized clinical trial.

    PubMed

    Rouhani, Mohammad Hossein; Mortazavi Najafabadi, Mojgan; Surkan, Pamela J; Esmaillzadeh, Ahmad; Feizi, Awat; Azadbakht, Leila

    2016-12-02

    Animal studies report that oat (Avena sativa L) intake has favorable effects on kidney function. However, the effects of oat consumption have not been assessed in humans. The aim of this study was to examine the impact of oat intake on biomarkers of renal function in patients with chronic kidney disease (CKD). Fifty-two patients with CKD were randomly assigned to a control group (recommended to reduce intake of dietary protein, phosphorus, sodium and potassium) or an oat consumption group (given nutritional recommendations for controls +50 g/day oats). Blood urea nitrogen (BUN), serum creatinine (SCr), urine creatinine, serum albumin, serum potassium, parathyroid hormone (PTH), serum klotho and urine protein concentration were measured at baseline and after an eight-week intervention. Creatinine clearance was calculated using urine creatinine concentration. Within group analysis showed a significant increase in BUN (P = 0.02) and serum potassium (P = 0.01) and a marginally significant increment in SCr (P = 0.08) among controls. However, changes in the oat group were not significant. In a multivariate adjusted model, we observed a significant difference in change of serum potassium (-0.03 mEq/L for oat group and 0.13 mEq/L for control group; P = 0.01) and a marginally significant difference in change of serum albumin (0.01 g/dl for oat group and -0.08 for control group; P = 0.08) between the two groups. There was no change in PTH concentration. Intake of oats may have a beneficial effect on serum albumin and serum potassium in patients with CKD. Present study registered under IRCT.ir identifier no. IRCT2015050414551N2. Copyright © 2016. Published by Elsevier Ltd.

  20. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  1. PARALLEL ASSAY OF OXYGEN EQUILIBRIA OF HEMOGLOBIN

    PubMed Central

    Lilly, Laura E.; Blinebry, Sara K.; Viscardi, Chelsea M.; Perez, Luis; Bonaventura, Joe; McMahon, Tim J.

    2013-01-01

    Methods to systematically analyze in parallel the function of multiple protein or cell samples in vivo or ex vivo (i.e. functional proteomics) in a controlled gaseous environment have thus far been limited. Here we describe an apparatus and procedure that enables, for the first time, parallel assay of oxygen equilibria in multiple samples. Using this apparatus, numerous simultaneous oxygen equilibrium curves (OECs) can be obtained under truly identical conditions from blood cell samples or purified hemoglobins (Hbs). We suggest that the ability to obtain these parallel datasets under identical conditions can be of immense value, both to biomedical researchers and clinicians who wish to monitor blood health, and to physiologists studying non-human organisms and the effects of climate change on these organisms. Parallel monitoring techniques are essential in order to better understand the functions of critical cellular proteins. The procedure can be applied to human studies, wherein an OEC can be analyzed in light of an individual’s entire genome. Here, we analyzed intraerythrocytic Hb, a protein that operates at the organism’s environmental interface and then comes into close contact with virtually all of the organism’s cells. The apparatus is theoretically scalable, and establishes a functional proteomic screen that can be correlated with genomic information on the same individuals. This new method is expected to accelerate our general understanding of protein function, an increasingly challenging objective as advances in proteomic and genomic throughput outpace the ability to study proteins’ functional properties. PMID:23827235

  2. Hybrid Optimization Parallel Search PACKage

    SciTech Connect

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.

  3. Suggestions to Reduce Clinical Fibromyalgia Pain and Experimentally Induced Pain Produce Parallel Effects on Perceived Pain but Divergent Functional MRI–Based Brain Activity

    PubMed Central

    Derbyshire, Stuart W.G.; Whalley, Matthew G.; Seah, Stanley T.H.; Oakley, David A.

    2017-01-01

    ABSTRACT Objective Hypnotic suggestion is an empirically validated form of pain control; however, the underlying mechanism remains unclear. Methods Thirteen fibromyalgia patients received suggestions to alter their clinical pain, and 15 healthy controls received suggestions to alter experimental heat pain. Suggestions were delivered before and after hypnotic induction with blood oxygen level–dependent (BOLD) activity measured concurrently. Results Across groups, suggestion produced substantial changes in pain report (main effect of suggestion, F2, 312 = 585.8; p < .0001), with marginally larger changes after induction (main effect of induction, F1, 312 = 3.6; p = .060). In patients, BOLD response increased with pain report in regions previously associated with pain, including thalamus and anterior cingulate cortex. In controls, BOLD response decreased with pain report. All changes were greater after induction. Region-of-interest analysis revealed largely linear patient responses with increasing pain report. Control responses, however, were higher after suggestion to increase or decrease pain from baseline. Conclusions Based on behavioral report alone, the mechanism of suggestion could be interpreted as largely similar regardless of the induction or type of pain experience. The functional magnetic resonance imaging data, however, demonstrated larger changes in brain activity after induction and a radically different pattern of brain activity for clinical pain compared with experimental pain. These findings imply that induction has an important effect on underlying neural activity mediating the effects of suggestion, and the mechanism of suggestion in patients altering clinical pain differs from that in controls altering experimental pain. Patient responses imply that suggestions altered pain experience via corresponding changes in pain-related brain regions, whereas control responses imply suggestion engaged cognitive control. PMID:27490850

  4. Parallel multilevel preconditioners

    SciTech Connect

    Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.

    1989-01-01

    In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.

  5. Homology, convergence and parallelism.

    PubMed

    Ghiselin, Michael T

    2016-01-05

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. © 2015 The Author(s).

  6. Parallel Anisotropic Tetrahedral Adaptation

    NASA Technical Reports Server (NTRS)

    Park, Michael A.; Darmofal, David L.

    2008-01-01

    An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.

  7. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  8. Homology, convergence and parallelism

    PubMed Central

    Ghiselin, Michael T.

    2016-01-01

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721

  9. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  10. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  11. Development of Parallel GSSHA

    DTIC Science & Technology

    2013-09-01

    C en te r Paul R. Eller , Jing-Ru C. Cheng, Aaron R. Byrd, Charles W. Downer, and Nawa Pradhan September 2013 Approved for public release...Program ERDC TR-13-8 September 2013 Development of Parallel GSSHA Paul R. Eller and Jing-Ru C. Cheng Information Technology Laboratory US Army Engineer...5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Paul Eller , Ruth Cheng, Aaron Byrd, Chuck Downer, and Nawa Pradhan 5d. PROJECT NUMBER

  12. Parallel unstructured grid generation

    NASA Technical Reports Server (NTRS)

    Loehner, Rainald; Camberos, Jose; Merriam, Marshal

    1991-01-01

    A parallel unstructured grid generation algorithm is presented and implemented on the Hypercube. Different processor hierarchies are discussed, and the appropraite hierarchies for mesh generation and mesh smoothing are selected. A domain-splitting algorithm for unstructured grids which tries to minimize the surface-to-volume ratio of each subdomain is described. This splitting algorithm is employed both for grid generation and grid smoothing. Results obtained on the Hypercube demonstrate the effectiveness of the algorithms developed.

  13. Implementation of Parallel Algorithms

    DTIC Science & Technology

    1993-06-30

    their socia ’ relations or to achieve some goals. For example, we define a pair-wise force law of i epulsion and attraction for a group of identical...quantization based compression schemes. Photo-refractive crystals, which provide high density recording in real time, are used as our holographic media . The...of Parallel Algorithms (J. Reif, ed.). Kluwer Academic Pu’ ishers, 1993. (4) "A Dynamic Separator Algorithm", D. Armon and J. Reif. To appear in

  14. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Painter, J.; Hansen, C.

    1996-10-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.

  15. Trajectory optimization using parallel shooting method on parallel computer

    SciTech Connect

    Wirthman, D.J.; Park, S.Y.; Vadali, S.R.

    1995-03-01

    The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.

  16. A low-fat yoghurt supplemented with a rooster comb extract on muscle joint function in adults with mild knee pain: a randomized, double blind, parallel, placebo-controlled, clinical trial of efficacy.

    PubMed

    Solà, Rosa; Valls, Rosa-Maria; Martorell, Isabel; Giralt, Montserrat; Pedret, Anna; Taltavull, Núria; Romeu, Marta; Rodríguez, Àurea; Moriña, David; Lopez de Frutos, Victor; Montero, Manuel; Casajuana, Maria-Carmen; Pérez, Laura; Faba, Jenny; Bernal, Gloria; Astilleros, Anna; González, Roser; Puiggrós, Francesc; Arola, Lluís; Chetrit, Carlos; Martinez-Puig, Daniel

    2015-11-01

    Preliminary results suggested that oral-administration of rooster comb extract (RCE) rich in hyaluronic acid (HA) was associated with improved muscle strength. Following these promising results, the objective of the present study was to evaluate the effect of low-fat yoghurt supplemented with RCE rich in HA on muscle function in adults with mild knee pain; a symptom of early osteoarthritis. Participants (n = 40) received low-fat yoghurt (125 mL d(-1)) supplemented with 80 mg d(-1) of RCE and the placebo group (n = 40) consumed the same yoghurt without the RCE, in a randomized, controlled, double-blind, parallel trial over 12 weeks. Using an isokinetic dynamometer (Biodex System 4), RCE consumption, compared to control, increased the affected knee peak torque, total work and mean power at 180° s(-1), at least 11% in men (p < 0.05) with no differences in women. No dietary differences were noted. These results suggest that long-term consumption of low-fat yoghurt supplemented with RCE could be a dietary tool to improve muscle strength in men, associated with possible clinical significance. However, further studies are needed to elucidate reasons for these sex difference responses observed, and may provide further insight into muscle function.

  17. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  18. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  19. Resistor Combinations for Parallel Circuits.

    ERIC Educational Resources Information Center

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  20. Status of TRANSP Parallel Services

    NASA Astrophysics Data System (ADS)

    Indireshkumar, K.; Andre, Robert; McCune, Douglas; Randerson, Lewis

    2006-10-01

    The PPPL TRANSP code suite has been used successfully over many years to carry out time dependent simulations of tokamak plasmas. However, accurately modeling certain phenomena such as RF heating and fast ion behavior using TRANSP requires extensive computational power and will benefit from parallelization. Parallelizing all of TRANSP is not required and parts will run sequentially while other parts run parallelized. To efficiently use a site's parallel services, the parallelized TRANSP modules are deployed to a shared ``parallel service'' on a separate cluster. The PPPL Monte Carlo fast ion module NUBEAM and the MIT RF module TORIC are the first TRANSP modules to be so deployed. This poster will show the performance scaling of these modules within the parallel server. Communications between the serial client and the parallel server will be described in detail, and measurements of startup and communications overhead will be shown. Physics modeling benefits for TRANSP users will be assessed.

  1. Asynchronous interpretation of parallel microprograms

    SciTech Connect

    Bandman, O.L.

    1984-03-01

    In this article, the authors demonstrate how to pass from a given synchronous interpretation of a parallel microprogram to an equivalent asynchronous interpretation, and investigate the cost associated with the rejection of external synchronization in parallel microprogram structures.

  2. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  3. Resistor Combinations for Parallel Circuits.

    ERIC Educational Resources Information Center

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  4. Global Arrays Parallel Programming Toolkit

    SciTech Connect

    Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel

    2011-01-01

    The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving

  5. The Structure of Parallel Algorithms.

    DTIC Science & Technology

    1979-08-01

    parallel architectures and parallel algorithms see [Anderson and Jensen 75, Stone 75, Kung 76, Enslow 77, Kuck 77, Ramamoorthy and Li 77, Sameh 77, Heller...the Routing Time on a Parallel Computer with a Fixed Interconnection Network, In Kuck., D. J., Lawrie, D.H. and Sameh , A.H., editor, High Speed...Letters 5(4):107-112, October 1976. [ Sameh 77] Sameh , A.H. Numerical Parallel Algorithms -- A Survey. In Hifh Speed Computer and AlgorLthm Organization

  6. Parallel Debugging Using Graphical Views

    DTIC Science & Technology

    1988-03-01

    Voyeur , a prototype system for creating graphical views of parallel programs, provid(s a cost-effective way to construct such views for any parallel...programming system. We illustrate Voyeur by discussing four views created for debugging Poker programs. One is a vteneral trace facility for any Poker...Graphical views are essential for debugging parallel programs because of the large quan- tity of state information contained in parallel programs. Voyeur

  7. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  8. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  9. Parallel Eclipse Project Checkout

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

    2011-01-01

    Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any

  10. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.

  11. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.

    1995-05-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.

  12. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  13. Roo: A parallel theorem prover

    SciTech Connect

    Lusk, E.L.; McCune, W.W.; Slaney, J.K.

    1991-11-01

    We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.

  14. CSM parallel structural methods research

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1989-01-01

    Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.

  15. Parallel Tree Contraction and Its Application.

    DTIC Science & Technology

    1985-12-01

    observed by Uspensky [231, see 112]. These bounds are commonly known as Chernoff bounds 16J. We shall use the following simply stated bounds [3. Theorem 6...Functions in Logarithmic Parallel Time. 25th Annual Symp. on Foundations of Computer Science, IEEE, 1984, pp. 12-22. 22. J. Uspensky . Introduction to

  16. SMM parallel battery operation in orbit

    NASA Technical Reports Server (NTRS)

    Broderick, R.

    1982-01-01

    A parallel battery system for the SMM spacecraft is described. The battery system performance as a function of lifetime over orbit was evaluated. The following equipment performance specifications were examined during a typical orbit: battery current and discharges, voltage limitations, battery temperature variations, and current sensor performance. Tabulated battery performance data is also included.

  17. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  18. Parallel Function Strategy in Pronoun Assignment

    ERIC Educational Resources Information Center

    Grober, Ellen H.; And Others

    1978-01-01

    Subjects completed sentences of the form NP1 aux V NP2 because (but) Pro...(e.g., John may scold Bill because he...) with a reason or motive for the action described. A basic perceptual strategy was hypothesized to underlie the comprehension of these sentences which have a potentially ambiguous pronoun in the subject position of the subordinate…

  19. Parallel computation of Gaussian processes

    NASA Astrophysics Data System (ADS)

    Preuss, R.; von Toussaint, U.

    2017-06-01

    Within the Bayesian framework we utilize Gaussian processes for parametric studies of long running computer codes. Since the simulations are expensive it is necessary to exploit the computational budget in the best possible manner. Employing the sum over variances - being indicators for the quality of the fit - as the utility function we established an optimized and automated sequential parameter selection procedure. However, often it is also desirable to utilize the parallel running capabilities of present computer technology and abandon the sequential parameter selection for a faster overall turn-around time (wall-clock time). The paper proposes to achieve this by marginalizing over the expected outcomes at optimized test points in order to set up a pool of starting values for batch execution.

  20. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  1. Benchmarking massively parallel architectures

    SciTech Connect

    Lubeck, O.; Moore, J.; Simmons, M.; Wasserman, H.

    1993-01-01

    The purpose of this paper is to summarize some initial experiences related to measuring the performance of massively parallel processors (MPPs) at Los Alamos National Laboratory (LANL). Actually, the range of MPP architectures the authors have used is rather limited, being confined mostly to the Thinking Machines Corporation (TMC) Connection Machine CM-2 and CM-5. Some very preliminary work has been carried out on the Kendall Square KSR-1, and efforts related to other machines, such as the Intel Paragon and the soon-to-be-released CRAY T3D are planned. This paper will concentrate more on methodology rather than discuss specific architectural strengths and weaknesses; the latter is expected to be the subject of future reports. MPP benchmarking is a field in critical need of structure and definition. As the authors have stated previously, such machines have enormous potential, and there is certainly a dire need for orders of magnitude computational power over current supercomputers. However, performance reports for MPPs must emphasize actual sustainable performance from real applications in a careful, responsible manner. Such has not always been the case. A recent paper has described in some detail, the problem of potentially misleading performance reporting in the parallel scientific computing field. Thus, in this paper, the authors briefly offer a few general ideas on MPP performance analysis.

  2. Parallelizing quantum circuit synthesis

    NASA Astrophysics Data System (ADS)

    Di Matteo, Olivia; Mosca, Michele

    2016-03-01

    Quantum circuit synthesis is the process in which an arbitrary unitary operation is decomposed into a sequence of gates from a universal set, typically one which a quantum computer can implement both efficiently and fault-tolerantly. As physical implementations of quantum computers improve, the need is growing for tools that can effectively synthesize components of the circuits and algorithms they will run. Existing algorithms for exact, multi-qubit circuit synthesis scale exponentially in the number of qubits and circuit depth, leaving synthesis intractable for circuits on more than a handful of qubits. Even modest improvements in circuit synthesis procedures may lead to significant advances, pushing forward the boundaries of not only the size of solvable circuit synthesis problems, but also in what can be realized physically as a result of having more efficient circuits. We present a method for quantum circuit synthesis using deterministic walks. Also termed pseudorandom walks, these are walks in which once a starting point is chosen, its path is completely determined. We apply our method to construct a parallel framework for circuit synthesis, and implement one such version performing optimal T-count synthesis over the Clifford+T gate set. We use our software to present examples where parallelization offers a significant speedup on the runtime, as well as directly confirm that the 4-qubit 1-bit full adder has optimal T-count 7 and T-depth 3.

  3. Parallel Eigenvalue extraction

    NASA Technical Reports Server (NTRS)

    Akl, Fred A.

    1989-01-01

    A new numerical algorithm for the solution of large-order eigenproblems typically encountered in linear elastic finite element systems is presented. The architecture of parallel processing is utilized in the algorithm to achieve increased speed and efficiency of calculations. The algorithm is based on the frontal technique for the solution of linear simultaneous equations and the modified subspace eigenanalysis method for the solution of the eigenproblem. Assembly, elimination and back-substitution of degrees of freedom are performed concurrently, using a number of fronts. All fronts converge to and diverge from a predefined global front during elimination and back-substitution, respectively. In the meantime, reduction of the stiffness and mass matrices required by the modified subspace method can be completed during the convergence/divergence cycle and an estimate of the required eigenpairs obtained. Successive cycles of convergence and divergence are repeated until the desired accuracy of calculations is achieved. The advantages of this new algorithm in parallel computer architecture are discussed.

  4. Massively Parallel QCD

    SciTech Connect

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-04-11

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.

  5. Parallel ptychographic reconstruction

    PubMed Central

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-01-01

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174

  6. Parallel ptychographic reconstruction

    SciTech Connect

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.

  7. Applied Parallel Metadata Indexing

    SciTech Connect

    Jacobi, Michael R

    2012-08-01

    The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

  8. Programming parallel architectures - The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  9. Constructing higher order DNA origami arrays using DNA junctions of anti-parallel/parallel double crossovers

    NASA Astrophysics Data System (ADS)

    Ma, Zhipeng; Park, Seongsu; Yamashita, Naoki; Kawai, Kentaro; Hirai, Yoshikazu; Tsuchiya, Toshiyuki; Tabata, Osamu

    2016-06-01

    DNA origami provides a versatile method for the construction of nanostructures with defined shape, size and other properties; such nanostructures may enable a hierarchical assembly of large scale architecture for the placement of other nanomaterials with atomic precision. However, the effective use of these higher order structures as functional components depends on knowledge of their assembly behavior and mechanical properties. This paper demonstrates construction of higher order DNA origami arrays with controlled orientations based on the formation of two types of DNA junctions: anti-parallel and parallel double crossovers. A two-step assembly process, in which preformed rectangular DNA origami monomer structures themselves undergo further self-assembly to form numerically unlimited arrays, was investigated to reveal the influences of assembly parameters. AFM observations showed that when parallel double crossover DNA junctions are used, the assembly of DNA origami arrays occurs with fewer monomers than for structures formed using anti-parallel double crossovers, given the same assembly parameters, indicating that the configuration of parallel double crossovers is not energetically preferred. However, the direct measurement by AFM force-controlled mapping shows that both DNA junctions of anti-parallel and parallel double crossovers have homogeneous mechanical stability with any part of DNA origami.

  10. A systolic array parallelizing compiler

    SciTech Connect

    Tseng, P.S. )

    1990-01-01

    This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.

  11. Parallel Computing in SCALE

    SciTech Connect

    DeHart, Mark D; Williams, Mark L; Bowman, Stephen M

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  12. Parallel mechanisms for visual search in zebrafish.

    PubMed

    Proulx, Michael J; Parker, Matthew O; Tahir, Yasser; Brennan, Caroline H

    2014-01-01

    Parallel visual search mechanisms have been reported previously only in mammals and birds, and not animals lacking an expanded telencephalon such as bees. Here we report the first evidence for parallel visual search in fish using a choice task where the fish had to find a target amongst an increasing number of distractors. Following two-choice discrimination training, zebrafish were presented with the original stimulus within an increasing array of distractor stimuli. We found that zebrafish exhibit no significant change in accuracy and approach latency as the number of distractors increased, providing evidence of parallel processing. This evidence challenges theories of vertebrate neural architecture and the importance of an expanded telencephalon for the evolution of executive function.

  13. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  14. Extending HPF for advanced data parallel applications

    NASA Technical Reports Server (NTRS)

    Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

    1994-01-01

    The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.

  15. Parallel Polarization State Generation

    NASA Astrophysics Data System (ADS)

    She, Alan; Capasso, Federico

    2016-05-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  16. Parallel Polarization State Generation.

    PubMed

    She, Alan; Capasso, Federico

    2016-05-17

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  17. Toward Parallel Document Clustering

    SciTech Connect

    Mogill, Jace A.; Haglin, David J.

    2011-09-01

    A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

  18. Parallel tridiagonal equation solvers

    NASA Technical Reports Server (NTRS)

    Stone, H. S.

    1974-01-01

    Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.

  19. Parallel Polarization State Generation

    PubMed Central

    She, Alan; Capasso, Federico

    2016-01-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813

  20. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  1. Parallel imaging microfluidic cytometer.

    PubMed

    Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take.

  2. A parallel programming environment supporting multiple data-parallel modules

    SciTech Connect

    Seevers, B.K.; Quinn, M.J. ); Hatcher, P.J. )

    1992-10-01

    We describe a system that allows programmers to take advantage of both control and data parallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream I/O to include intermodule communication channels. The progammer writes each module as a separate data-parallel program, then develops a channel linker specification describing how to connect the modules together. A channel linker we have developed loads the separate modules on the parallel machine and binds the communication channels together as specified. We present performance data that demonstrates a mixed control- and data-parallel solution can yield better performance than a strictly data-parallel solution. The system described currently runs on the Intel iWarp multicomputer.

  3. Detecting opportunities for parallel observations on the Hubble Space Telescope

    NASA Technical Reports Server (NTRS)

    Lucks, Michael

    1992-01-01

    The presence of multiple scientific instruments aboard the Hubble Space Telescope provides opportunities for parallel science, i.e., the simultaneous use of different instruments for different observations. Determining whether candidate observations are suitable for parallel execution depends on numerous criteria (some involving quantitative tradeoffs) that may change frequently. A knowledge based approach is presented for constructing a scoring function to rank candidate pairs of observations for parallel science. In the Parallel Observation Matching System (POMS), spacecraft knowledge and schedulers' preferences are represented using a uniform set of mappings, or knowledge functions. Assessment of parallel science opportunities is achieved via composition of the knowledge functions in a prescribed manner. The knowledge acquisition, and explanation facilities of the system are presented. The methodology is applicable to many other multiple criteria assessment problems.

  4. Combinatorial parallel and scientific computing.

    SciTech Connect

    Pinar, Ali; Hendrickson, Bruce Alan

    2005-04-01

    Combinatorial algorithms have long played a pivotal enabling role in many applications of parallel computing. Graph algorithms in particular arise in load balancing, scheduling, mapping and many other aspects of the parallelization of irregular applications. These are still active research areas, mostly due to evolving computational techniques and rapidly changing computational platforms. But the relationship between parallel computing and discrete algorithms is much richer than the mere use of graph algorithms to support the parallelization of traditional scientific computations. Important, emerging areas of science are fundamentally discrete, and they are increasingly reliant on the power of parallel computing. Examples include computational biology, scientific data mining, and network analysis. These applications are changing the relationship between discrete algorithms and parallel computing. In addition to their traditional role as enablers of high performance, combinatorial algorithms are now customers for parallel computing. New parallelization techniques for combinatorial algorithms need to be developed to support these nontraditional scientific approaches. This chapter will describe some of the many areas of intersection between discrete algorithms and parallel scientific computing. Due to space limitations, this chapter is not a comprehensive survey, but rather an introduction to a diverse set of techniques and applications with a particular emphasis on work presented at the Eleventh SIAM Conference on Parallel Processing for Scientific Computing. Some topics highly relevant to this chapter (e.g. load balancing) are addressed elsewhere in this book, and so we will not discuss them here.

  5. Explicit Parallelization of Robert-Bonamy Formalism

    NASA Astrophysics Data System (ADS)

    Styers, John M.; Gamache, Robert

    2014-06-01

    Robert-Bonamy formalism has long been employed in computational spectroscopy. As a method, it presents a fine balance between accuracy, and computational viability. While within the bounds of present-day computational resources, its calculations still constitute a significant amount of computational overhead. The vast majority of said computational demand, is in the computing of the resonance functions. Major aspects of the calculation of the resonance function are extremely repetitive in nature—presenting a problem which is almost "embarrassingly parallel" in nature. The computation of the resonance functions has been explicitly parallelized, resulting in an order of magnitude speed-up on local Macintosh machines—and multiple orders of magnitude speed-up on two Cray Supercomputers (Darter and MGHPCC). This will facilitate further scientific investigation.

  6. A parallel algorithm for channel routing on a hypercube

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall; Banerjee, Prithviraj

    1987-01-01

    A new parallel simulated annealing algorithm for channel routing on a P processor hypercube is presented. The basic idea used is to partition a set of tracks equally among processors in the hypercube. In parallel, P/2 pairs of processors perform displacements and exchanges of nets between tracks, compute the changes in cost functions, and accept moves using a parallel annealing criteria. Through the use of a unique distributed data structure, it is possible to minimize message traffic and add versatility and efficiency in a parallel routing tool. The algorithm has been implemented and is being tested on some of the popular channel problems from the literature.

  7. The parallel I/O architecture of the high performance storage system (HPSS). Revision 1

    SciTech Connect

    Watson, R.W.; Coyne, R.A.

    1995-04-01

    Datasets up to terabyte size and petabyte capacities have created a serious imbalance between I/O and storage system performance and system functionality. One promising approach is the use of parallel data transfer techniques for client access to storage, peripheral-to-peripheral transfers, and remote file transfers. This paper describes the parallel I/O architecture and mechanisms, Parallel Transport Protocol (PTP), parallel FTP, and parallel client Application Programming Interface (API) used by the High Performance Storage System (HPSS). Parallel storage integration issues with a local parallel file system are also discussed.

  8. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  9. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  10. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  11. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  12. High Performance Parallel Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek; Kaewpijit, Sinthop

    1998-01-01

    Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.

  13. High Performance Parallel Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek; Kaewpijit, Sinthop

    1998-01-01

    Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.

  14. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  15. Parallel Adaptive Mesh Refinement

    SciTech Connect

    Diachin, L; Hornung, R; Plassmann, P; WIssink, A

    2005-03-04

    As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the

  16. Parallelization of a blind deconvolution algorithm

    NASA Astrophysics Data System (ADS)

    Matson, Charles L.; Borelli, Kathy J.

    2006-09-01

    Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.

  17. Sequential and Parallel Matrix Computations.

    DTIC Science & Technology

    1985-11-01

    Theory" published by the American Math Society. (C) Jointly with A. Sameh of University of Illinois, a parallel algorithm for the single-input pole...an M.Sc. thesis at Northern Illinois University by Ava Chun and, the results were compared with parallel Q-R algorithm of Sameh and Kuck and the

  18. Parallel pseudospectral domain decomposition techniques

    NASA Technical Reports Server (NTRS)

    Gottlieb, David; Hirsh, Richard S.

    1988-01-01

    The influence of interface boundary conditions on the ability to parallelize pseudospectral multidomain algorithms is investigated. Using the properties of spectral expansions, a novel parallel two domain procedure is generalized to an arbitrary number of domains each of which can be solved on a separate processor. This interface boundary condition considerably simplifies influence matrix techniques.

  19. Parallel pseudospectral domain decomposition techniques

    NASA Technical Reports Server (NTRS)

    Gottlieb, David; Hirsch, Richard S.

    1989-01-01

    The influence of interface boundary conditions on the ability to parallelize pseudospectral multidomain algorithms is investigated. Using the properties of spectral expansions, a novel parallel two domain procedure is generalized to an arbitrary number of domains each of which can be solved on a separate processor. This interface boundary condition considerably simplifies influence matrix techniques.

  20. A Parallel Particle Swarm Optimizer

    DTIC Science & Technology

    2003-01-01

    by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based...concurrent computation. The parallelization of the Particle Swarm Optimization (PSO) algorithm is detailed and its performance and characteristics demonstrated for the biomechanical system identification problem as example.

  1. Self-Reported quality of life in adults with attention-deficit/hyperactivity disorder and executive function impairment treated with lisdexamfetamine dimesylate: a randomized, double-blind, multicenter, placebo-controlled, parallel-group study.

    PubMed

    Adler, Lenard A; Dirks, Bryan; Deas, Patrick; Raychaudhuri, Aparna; Dauphin, Matthew; Saylor, Keith; Weisler, Richard

    2013-10-09

    This study examined the effects of lisdexamfetamine dimesylate (LDX) on quality of life (QOL) in adults with attention-deficit/hyperactivity disorder (ADHD) and clinically significant executive function deficits (EFD). This report highlights QOL findings from a 10-week randomized placebo-controlled trial of LDX (30-70 mg/d) in adults (18-55 years) with ADHD and EFD (Behavior Rating Inventory of EF-Adult, Global Executive Composite [BRIEF-A GEC] ≥65). The primary efficacy measure was the self-reported BRIEF-A; a key secondary measure was self-reported QOL on the Adult ADHD Impact Module (AIM-A). The clinician-completed ADHD Rating Scale version IV (ADHD-RS-IV) with adult prompts and Clinical Global Impressions-Severity (CGI-S) were also employed. The Adult ADHD QoL (AAQoL) was added while the study was in progress. A post hoc analysis examined the subgroup having evaluable results from both AIM-A and AAQoL. Of 161 randomized (placebo, 81; LDX, 80), 159 were included in the safety population. LDX improved AIM-A multi-item domain scores versus placebo; LS mean difference for Performance and Daily Functioning was 21.6 (ES, 0.93, P<.0001); Impact of Symptoms: Daily Interference was 14.9 (ES, 0.62, P<.0001); Impact of Symptoms: Bother/Concern was 13.5 (ES, 0.57, P=.0003); Relationships/Communication was 7.8 (ES, 0.31, P=.0302); Living With ADHD was 9.1 (ES, 0.79, P<.0001); and General Well-Being was 10.8 (ES, 0.70, P<.0001). AAQoL LS mean difference for total score was 21.0; for subscale: Life Productivity was 21.0; Psychological Health was 12.1; Life Outlook was 12.5; and Relationships was 7.3. In a post hoc analysis of participants with both AIM-A and AAQoL scores, AIM-A multi-item subgroup analysis scores numerically improved with LDX, with smaller difference for Impact of Symptoms: Daily Interference. The safety profile of LDX was consistent with amphetamine use in previous studies. Overall, adults with ADHD/EFD exhibited self-reported improvement on QOL, using the

  2. Parallel contingency statistics with Titan.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2009-09-01

    This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.

  3. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  4. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  5. Kalman Filter Tracking on Parallel Architectures

    NASA Astrophysics Data System (ADS)

    Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2016-11-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment.

  6. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  7. Parallel NPARC: Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Townsend, S. E.

    1996-01-01

    Version 3 of the NPARC Navier-Stokes code includes support for large-grain (block level) parallelism using explicit message passing between a heterogeneous collection of computers. This capability has the potential for significant performance gains, depending upon the block data distribution. The parallel implementation uses a master/worker arrangement of processes. The master process assigns blocks to workers, controls worker actions, and provides remote file access for the workers. The processes communicate via explicit message passing using an interface library which provides portability to a number of message passing libraries, such as PVM (Parallel Virtual Machine). A Bourne shell script is used to simplify the task of selecting hosts, starting processes, retrieving remote files, and terminating a computation. This script also provides a simple form of fault tolerance. An analysis of the computational performance of NPARC is presented, using data sets from an F/A-18 inlet study and a Rocket Based Combined Cycle Engine analysis. Parallel speedup and overall computational efficiency were obtained for various NPARC run parameters on a cluster of IBM RS6000 workstations. The data show that although NPARC performance compares favorably with the estimated potential parallelism, typical data sets used with previous versions of NPARC will often need to be reblocked for optimum parallel performance. In one of the cases studied, reblocking increased peak parallel speedup from 3.2 to 11.8.

  8. Parallel processing for control applications

    SciTech Connect

    Telford, J. W.

    2001-01-01

    Parallel processing has been a topic of discussion in computer science circles for decades. Using more than one single computer to control a process has many advantages that compensate for the additional cost. Initially multiple computers were used to attain higher speeds. A single cpu could not perform all of the operations necessary for real time operation. As technology progressed and cpu's became faster, the speed issue became less significant. The additional processing capabilities however continue to make high speeds an attractive element of parallel processing. Another reason for multiple processors is reliability. For the purpose of this discussion, reliability and robustness will be the focal paint. Most contemporary conceptions of parallel processing include visions of hundreds of single computers networked to provide 'computing power'. Indeed our own teraflop machines are built from large numbers of computers configured in a network (and thus limited by the network). There are many approaches to parallel configfirations and this presentation offers something slightly different from the contemporary networked model. In the world of embedded computers, which is a pervasive force in contemporary computer controls, there are many single chip computers available. If one backs away from the PC based parallel computing model and considers the possibilities of a parallel control device based on multiple single chip computers, a new area of possibilities becomes apparent. This study will look at the use of multiple single chip computers in a parallel configuration with emphasis placed on maximum reliability.

  9. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  10. EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS

    SciTech Connect

    F. PETRINI; W. FENG

    1999-09-01

    We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

  11. Parallel integer sorting with medium and fine-scale parallelism

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  12. Parallel, Implicit, Finite Element Solver

    NASA Astrophysics Data System (ADS)

    Lowrie, Weston; Shumlak, Uri; Meier, Eric; Marklin, George

    2007-11-01

    A parallel, implicit, finite element solver is described for solutions to the ideal MHD equations and the Pseudo-1D Euler equations. The solver uses the conservative flux source form of the equations. This helps simplify the discretization of the finite element method by keeping the specification of the physics separate. An implicit time advance is used to allow sufficiently large time steps. The Portable Extensible Toolkit for Scientific Computation (PETSc) is implemented for parallel matrix solvers and parallel data structures. Results for several test cases are described as well as accuracy of the method.

  13. Multigrid on massively parallel architectures

    SciTech Connect

    Falgout, R D; Jones, J E

    1999-09-17

    The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.

  14. Parallel Architecture For Robotics Computation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1990-01-01

    Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.

  15. IOPA: I/O-aware parallelism adaption for parallel programs

    PubMed Central

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236

  16. Quantitative selection and parallel characterization of aptamers

    PubMed Central

    Cho, Minseon; Soo Oh, Seung; Nie, Jeff; Stewart, Ron; Eisenstein, Michael; Chambers, James; Marth, Jamey D.; Walker, Faye; Thomson, James A.; Soh, H. Tom

    2013-01-01

    Aptamers are promising affinity reagents that are potentially well suited for high-throughput discovery, as they are chemically synthesized and discovered via completely in vitro selection processes. Recent advancements in selection, sequencing, and the use of modified bases have improved aptamer quality, but the overall process of aptamer generation remains laborious and low-throughput. This is because binding characterization remains a critical bottleneck, wherein the affinity and specificity of each candidate aptamer are measured individually in a serial manner. To accelerate aptamer discovery, we devised the Quantitative Parallel Aptamer Selection System (QPASS), which integrates microfluidic selection and next-generation sequencing with in situ-synthesized aptamer arrays, enabling simultaneous measurement of affinity and specificity for thousands of candidate aptamers in parallel. After using QPASS to select aptamers for the human cancer biomarker angiopoietin-2 (Ang2), we in situ synthesized arrays of the selected sequences and obtained equilibrium dissociation constants (Kd) for every aptamer in parallel. We thereby identified over a dozen high-affinity Ang2 aptamers, with Kd as low as 20.5 ± 7.3 nM. The same arrays enabled us to quantify binding specificity for these aptamers in parallel by comparing relative binding of differentially labeled target and nontarget proteins, and by measuring their binding affinity directly in complex samples such as undiluted serum. Finally, we show that QPASS offers a compelling avenue for exploring structure−function relationships for large numbers of aptamers in parallel by coupling array-based affinity measurements with next-generation sequencing data to identify nucleotides and motifs within the aptamer that critically affect Ang2 binding. PMID:24167271

  17. Appendix E: Parallel Pascal development system

    NASA Technical Reports Server (NTRS)

    1985-01-01

    The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.

  18. Appendix E: Parallel Pascal development system

    NASA Technical Reports Server (NTRS)

    1985-01-01

    The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.

  19. "Feeling" Series and Parallel Resistances.

    ERIC Educational Resources Information Center

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  20. Demonstrating Forces between Parallel Wires.

    ERIC Educational Resources Information Center

    Baker, Blane

    2000-01-01

    Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)

  1. Parallel programming of industrial applications

    SciTech Connect

    Heroux, M; Koniges, A; Simon, H

    1998-07-21

    In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from these applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).

  2. New NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

    1997-01-01

    NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

  3. Demonstrating Forces between Parallel Wires.

    ERIC Educational Resources Information Center

    Baker, Blane

    2000-01-01

    Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)

  4. Parallel hierarchical method in networks

    NASA Astrophysics Data System (ADS)

    Malinochka, Olha; Tymchenko, Leonid

    2007-09-01

    This method of parallel-hierarchical Q-transformation offers new approach to the creation of computing medium - of parallel -hierarchical (PH) networks, being investigated in the form of model of neurolike scheme of data processing [1-5]. The approach has a number of advantages as compared with other methods of formation of neurolike media (for example, already known methods of formation of artificial neural networks). The main advantage of the approach is the usage of multilevel parallel interaction dynamics of information signals at different hierarchy levels of computer networks, that enables to use such known natural features of computations organization as: topographic nature of mapping, simultaneity (parallelism) of signals operation, inlaid cortex, structure, rough hierarchy of the cortex, spatially correlated in time mechanism of perception and training [5].

  5. "Feeling" Series and Parallel Resistances.

    ERIC Educational Resources Information Center

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  6. Address tracing for parallel machines

    NASA Technical Reports Server (NTRS)

    Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

    1991-01-01

    Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.

  7. Debugging in a parallel environment

    SciTech Connect

    Wasserman, H.J.; Griffin, J.H.

    1985-01-01

    This paper describes the preliminary results of a project investigating approaches to dynamic debugging in parallel processing systems. Debugging programs in a multiprocessing environment is particularly difficult because of potential errors in synchronization of tasks, data dependencies, sharing of data among tasks, and irreproducibility of specific machine instruction sequences from one job to the next. The basic methodology involved in predicate-based debuggers is given as well as other desirable features of dynamic parallel debugging. 13 refs.

  8. Parallel Algorithms for Image Analysis.

    DTIC Science & Technology

    1982-06-01

    8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9

  9. Parallel software tools at Langley Research Center

    NASA Technical Reports Server (NTRS)

    Moitra, Stuti; Tennille, Geoffrey M.; Lakeotes, Christopher D.; Randall, Donald P.; Arthur, Jarvis J.; Hammond, Dana P.; Mall, Gerald H.

    1993-01-01

    This document gives a brief overview of parallel software tools available on the Intel iPSC/860 parallel computer at Langley Research Center. It is intended to provide a source of information that is somewhat more concise than vendor-supplied material on the purpose and use of various tools. Each of the chapters on tools is organized in a similar manner covering an overview of the functionality, access information, how to effectively use the tool, observations about the tool and how it compares to similar software, known problems or shortfalls with the software, and reference documentation. It is primarily intended for users of the iPSC/860 at Langley Research Center and is appropriate for both the experienced and novice user.

  10. Flexibility and Performance of Parallel File Systems

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1996-01-01

    As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.

  11. Oxytocin: parallel processing in the social brain?

    PubMed

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.

  12. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  13. Parallel processing of natural language

    SciTech Connect

    Chang, H.O.

    1986-01-01

    Two types of parallel natural language processing are studied in this work: (1) the parallelism between syntactic and nonsyntactic processing and (2) the parallelism within syntactic processing. It is recognized that a syntactic category can potentially be attached to more than one node in the syntactic tree of a sentence. Even if all the attachments are syntactically well-formed, nonsyntactic factors such as semantic and pragmatic consideration may require one particular attachment. Syntactic processing must synchronize and communicate with nonsyntactic processing. Two syntactic processing algorithms are proposed for use in a parallel environment: Early's algorithm and the LR(k) algorithm. Conditions are identified to detect the syntactic ambiguity and the algorithms are augmented accordingly. It is shown that by using nonsyntactic information during syntactic processing, backtracking can be reduced, and the performance of the syntactic processor is improved. For the second type of parallelism, it is recognized that one portion of a grammar can be isolated from the rest of the grammar and be processed by a separate processor. A partial grammar of a larger grammar is defined. Parallel syntactic processing is achieved by using two processors concurrently: the main processor (mp) and the two processors concurrently: the main processor (mp) and the auxiliary processor (ap).

  14. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  15. Efficiency of parallel direct optimization.

    PubMed

    Janies, D A; Wheeler, W C

    2001-03-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size.

  16. Modelling Layer parallel stylolites

    NASA Astrophysics Data System (ADS)

    Koehn, Daniel; Pataki Rood, Daisy; Beaudoin, Nicolas

    2016-04-01

    We modeled the geometrical roughening of mainly layer-dominated stylolites in order to understand their structural evolution, to present an advanced classification of stylolite shapes and to relate this classification to chemical compaction and stylolite sealing capabilities. Our simulations show that layer-dominated stylolites can grow in three distinct stages, an initial slow nucleation, a fast layer-pinning phase and a final freezing stage if the layer dissolves completely during growth. Dissolution of the pinning layer and thus destruction of the compaction tracking capabilities is a function of the background noise in the rock and the dissolution rate of the layer itself. Low background noise needs a slower dissolving layer for pinning to be successful but produces flatter teeth than higher background noise. We present an advanced classification based on our simulations and separate stylolites into four classes: rectangular layer type, seismogram pinning type, suture/sharp peak type and simple wave-like type.

  17. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  18. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  19. Parallels in lignin biosynthesis

    PubMed Central

    Weng, Jing-Ke; Banks, Jo Ann

    2008-01-01

    A hallmark of vascular plants is the development of a complex water-conducting system, which is physically reinforced by the heterogeneous aromatic polymer lignin. Syringyl lignin, a major building block of lignin, is often thought to be uniquely characteristic of angiosperms; however, it was demonstrated over fifty years ago that that syringyl lignin is found in another group of plants, known as the lycophytes, the ancestors of which diverged from all the other vascular plant lineages 400 million years ago.1 To determine the biochemical basis for this common biosynthetic ability, we isolated and characterized cytochrome P450-dependent monooxygenases (P450s) from the lycophyte Selaginella moellendorffii and compared them to the enzyme that is required for syringyl lignin synthesis in angiosperms. Our results showed that one of these P450s encodes an enzyme that is functionally analogous to but phylogenetically independent from its angiosperm counterpart. Here, we discuss the evolution of lignin biosynthesis in vascular plants and the role of Selaginella moellendorffii in plant comparative biology and genomics. PMID:19704782

  20. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-18

    Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  1. The economics of parallel trade.

    PubMed

    Danzon, P M

    1998-03-01

    The potential for parallel trade in the European Union (EU) has grown with the accession of low price countries and the harmonisation of registration requirements. Parallel trade implies a conflict between the principle of autonomy of member states to set their own pharmaceutical prices, the principle of free trade and the industrial policy goal of promoting innovative research and development (R&D). Parallel trade in pharmaceuticals does not yield the normal efficiency gains from trade because countries achieve low pharmaceutical prices by aggressive regulation, not through superior efficiency. In fact, parallel trade reduces economic welfare by undermining price differentials between markets. Pharmaceutical R&D is a global joint cost of serving all consumers worldwide; it accounts for roughly 30% of total costs. Optimal (welfare maximising) pricing to cover joint costs (Ramsey pricing) requires setting different prices in different markets, based on inverse demand elasticities. By contrast, parallel trade and regulation based on international price comparisons tend to force price convergence across markets. In response, manufacturers attempt to set a uniform 'euro' price. The primary losers from 'euro' pricing will be consumers in low income countries who will face higher prices or loss of access to new drugs. In the long run, even higher income countries are likely to be worse off with uniform prices, because fewer drugs will be developed. One policy option to preserve price differentials is to exempt on-patent products from parallel trade. An alternative is confidential contracting between individual manufacturers and governments to provide country-specific ex post discounts from the single 'euro' wholesale price, similar to rebates used by managed care in the US. This would preserve differentials in transactions prices even if parallel trade forces convergence of wholesale prices.

  2. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  3. Programming parallel architectures: The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  4. Parallelizing the track-target model for the MIMD machine

    SciTech Connect

    Zhong Xiong, W.; Swietlik, C.

    1992-09-01

    Military Tracking-Target systems are important analysis tools for modelling the major functions of a strategic defense system operating against a ballistic missile threat during a simulated end-to-end scenario. As demands grow for modelling more trajectories with increasing numbers of missile types, so have demands for more processing power. Argonne National Laboratory has developed the parallel version of this Tracking-Target model. The parallel version has exhibited speedups of up to a factor of 6.3 resulting from a shared memory multiprocessor machine. This paper documents a project to implement the Tracking-Target model on a parallel processing environment.

  5. Memory Scalability and Efficiency Analysis of Parallel Codes

    SciTech Connect

    Janjusic, Tommy; Kartsaklis, Christos

    2015-01-01

    Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an application s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).

  6. General upper bounds on the runtime of parallel evolutionary algorithms.

    PubMed

    Lässig, Jörg; Sudholt, Dirk

    2014-01-01

    We present a general method for analyzing the runtime of parallel evolutionary algorithms with spatially structured populations. Based on the fitness-level method, it yields upper bounds on the expected parallel runtime. This allows for a rigorous estimate of the speedup gained by parallelization. Tailored results are given for common migration topologies: ring graphs, torus graphs, hypercubes, and the complete graph. Example applications for pseudo-Boolean optimization show that our method is easy to apply and that it gives powerful results. In our examples the performance guarantees improve with the density of the topology. Surprisingly, even sparse topologies such as ring graphs lead to a significant speedup for many functions while not increasing the total number of function evaluations by more than a constant factor. We also identify which number of processors lead to the best guaranteed speedups, thus giving hints on how to parameterize parallel evolutionary algorithms.

  7. Approximating Functions with Exponential Functions

    ERIC Educational Resources Information Center

    Gordon, Sheldon P.

    2005-01-01

    The possibility of approximating a function with a linear combination of exponential functions of the form e[superscript x], e[superscript 2x], ... is considered as a parallel development to the notion of Taylor polynomials which approximate a function with a linear combination of power function terms. The sinusoidal functions sin "x" and cos "x"…

  8. A class of trust-region methods for parallel optimization

    SciTech Connect

    P. D. Hough; J. C. Meza

    1999-03-01

    The authors present a new class of optimization methods that incorporates a Parallel Direct Search (PDS) method within a trust-region Newton framework. This approach combines the inherent parallelism of PDS with the rapid and robust convergence properties of Newton methods. Numerical tests have yielded favorable results for both standard test problems and engineering applications. In addition, the new method appears to be more robust in the presence of noisy functions that are inherent in many engineering simulations.

  9. Computing contingency statistics in parallel.

    SciTech Connect

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    2010-09-01

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

  10. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  11. Genomics of Parallel Experimental Evolution in Drosophila

    PubMed Central

    Graves, J.L.; Hertweck, K.L.; Han, M.V.; Cabral, L.G.; Barter, T.T.; Greer, L.F.; Burke, M.K.; Mueller, L.D.; Rose, M.R.

    2017-01-01

    Abstract What are the genomic foundations of adaptation in sexual populations? We address this question using fitness–character and whole-genome sequence data from 30 Drosophila laboratory populations. These 30 populations are part of a nearly 40-year laboratory radiation featuring 3 selection regimes, each shared by 10 populations for up to 837 generations, with moderately large effective population sizes. Each of 3 sets of the 10 populations that shared a selection regime consists of 5 populations that have long been maintained under that selection regime, paired with 5 populations that had only recently been subjected to that selection regime. We find a high degree of evolutionary parallelism in fitness phenotypes when most-recent selection regimes are shared, as in previous studies from our laboratory. We also find genomic parallelism with respect to the frequencies of single-nucleotide polymorphisms, transposable elements, insertions, and structural variants, which was expected. Entirely unexpected was a high degree of parallelism for linkage disequilibrium. The evolutionary genetic changes among these sexual populations are rapid and genomically extensive. This pattern may be due to segregating functional genetic variation that is abundantly maintained genome-wide by selection, variation that responds immediately to changes of selection regime. PMID:28087779

  12. Genomics of Parallel Experimental Evolution in Drosophila.

    PubMed

    Graves, J L; Hertweck, K L; Phillips, M A; Han, M V; Cabral, L G; Barter, T T; Greer, L F; Burke, M K; Mueller, L D; Rose, M R

    2017-04-01

    What are the genomic foundations of adaptation in sexual populations? We address this question using fitness-character and whole-genome sequence data from 30 Drosophila laboratory populations. These 30 populations are part of a nearly 40-year laboratory radiation featuring 3 selection regimes, each shared by 10 populations for up to 837 generations, with moderately large effective population sizes. Each of 3 sets of the 10 populations that shared a selection regime consists of 5 populations that have long been maintained under that selection regime, paired with 5 populations that had only recently been subjected to that selection regime. We find a high degree of evolutionary parallelism in fitness phenotypes when most-recent selection regimes are shared, as in previous studies from our laboratory. We also find genomic parallelism with respect to the frequencies of single-nucleotide polymorphisms, transposable elements, insertions, and structural variants, which was expected. Entirely unexpected was a high degree of parallelism for linkage disequilibrium. The evolutionary genetic changes among these sexual populations are rapid and genomically extensive. This pattern may be due to segregating functional genetic variation that is abundantly maintained genome-wide by selection, variation that responds immediately to changes of selection regime. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. Kalman Filter Tracking on Parallel Architectures

    NASA Astrophysics Data System (ADS)

    Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2015-12-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter [2]. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.

  14. Parallel Stitching of Two-Dimensional Materials

    NASA Astrophysics Data System (ADS)

    Ling, Xi; Lin, Yuxuan; Dresselhaus, Mildred; Palacios, Tomás; Kong, Jing; Department of Electrical Engineering; Computer Science, Massachusetts Institute of Technology Team

    Large scale integration of atomically thin metals (e.g. graphene), semiconductors (e.g. transition metal dichalcogenides (TMDs)), and insulators (e.g. hexagonal boron nitride) is critical for constructing the building blocks for future nanoelectronics and nanophotonics. However, the construction of in-plane heterostructures, especially between two atomic layers with large lattice mismatch, could be extremely difficult due to the strict requirement of spatial precision and the lack of a selective etching method. Here, we developed a general synthesis methodology to achieve both vertical and in-plane ``parallel stitched'' heterostructures between a two-dimensional (2D) and TMD materials, which enables both multifunctional electronic/optoelectronic devices and their large scale integration. This is achieved via selective ``sowing'' of aromatic molecule seeds during the chemical vapor deposition growth. MoS2 is used as a model system to form heterostructures with diverse other 2D materials. Direct and controllable synthesis of large-scale parallel stitched graphene-MoS2 heterostructures was further investigated. Unique nanometer overlapped junctions were obtained at the parallel stitched interface, which are highly desirable both as metal-semiconductor contact and functional devices/systems, such as for use in logical integrated circuits (ICs) and broadband photodetectors.

  15. Parallel Quantum Circuit in a Tunnel Junction

    NASA Astrophysics Data System (ADS)

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian; GNS theory Group Team

    In between 2 metallic nanopads, adding identical and independent electron transfer paths in parallel increases the electronic effective coupling between the 2 nanopads through the quantum circuit defined by those paths. Measuring this increase of effective coupling using the tunnelling current intensity can lead for example for 2 paths in parallel to the now standard G =G1 +G2 + 2√{G1 .G2 } conductance superposition law (1). This is only valid for the tunnelling regime (2). For large electronic coupling to the nanopads (or at resonance), G can saturate and even decay as a function of the number of parallel paths added in the quantum circuit (3). We provide here the explanation of this phenomenon: the measurement of the effective Rabi oscillation frequency using the current intensity is constrained by the normalization principle of quantum mechanics. This limits the quantum conductance G for example to go when there is only one channel per metallic nanopads. This ef fect has important consequences for the design of Boolean logic gates at the atomic scale using atomic scale or intramolecular circuits. References: This has the financial support by European PAMS project.

  16. Visualizing Parallel Computer System Performance

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.

    1988-01-01

    Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

  17. Massively parallel MRI detector arrays.

    PubMed

    Keil, Boris; Wald, Lawrence L

    2013-04-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Features in Continuous Parallel Coordinates.

    PubMed

    Lehmann, Dirk J; Theisel, Holger

    2011-12-01

    Continuous Parallel Coordinates (CPC) are a contemporary visualization technique in order to combine several scalar fields, given over a common domain. They facilitate a continuous view for parallel coordinates by considering a smooth scalar field instead of a finite number of straight lines. We show that there are feature curves in CPC which appear to be the dominant structures of a CPC. We present methods to extract and classify them and demonstrate their usefulness to enhance the visualization of CPCs. In particular, we show that these feature curves are related to discontinuities in Continuous Scatterplots (CSP). We show this by exploiting a curve-curve duality between parallel and Cartesian coordinates, which is a generalization of the well-known point-line duality. Furthermore, we illustrate the theoretical considerations. Concluding, we discuss relations and aspects of the CPC's/CSP's features concerning the data analysis.

  19. Parallel integrated frame synchronizer chip

    NASA Technical Reports Server (NTRS)

    Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)

    2000-01-01

    A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.

  20. PARAVT: Parallel Voronoi tessellation code

    NASA Astrophysics Data System (ADS)

    González, R. E.

    2016-10-01

    In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.

  1. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  2. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  3. Fast data parallel polygon rendering

    SciTech Connect

    Ortega, F.A.; Hansen, C.D.

    1993-09-01

    This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.

  4. FEREBUS: Highly parallelized engine for kriging training.

    PubMed

    Di Pasquale, Nicodemo; Bane, Michael; Davie, Stuart J; Popelier, Paul L A

    2016-11-05

    FFLUX is a novel force field based on quantum topological atoms, combining multipolar electrostatics with IQA intraatomic and interatomic energy terms. The program FEREBUS calculates the hyperparameters of models produced by the machine learning method kriging. Calculation of kriging hyperparameters (θ and p) requires the optimization of the concentrated log-likelihood L̂(θ,p). FEREBUS uses Particle Swarm Optimization (PSO) and Differential Evolution (DE) algorithms to find the maximum of L̂(θ,p). PSO and DE are two heuristic algorithms that each use a set of particles or vectors to explore the space in which L̂(θ,p) is defined, searching for the maximum. The log-likelihood is a computationally expensive function, which needs to be calculated several times during each optimization iteration. The cost scales quickly with the problem dimension and speed becomes critical in model generation. We present the strategy used to parallelize FEREBUS, and the optimization of L̂(θ,p) through PSO and DE. The code is parallelized in two ways. MPI parallelization distributes the particles or vectors among the different processes, whereas the OpenMP implementation takes care of the calculation of L̂(θ,p), which involves the calculation and inversion of a particular matrix, whose size increases quickly with the dimension of the problem. The run time shows a speed-up of 61 times going from single core to 90 cores with a saving, in one case, of ∼98% of the single core time. In fact, the parallelization scheme presented reduces computational time from 2871 s for a single core calculation, to 41 s for 90 cores calculation. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.

  5. Hybrid parallel programming with MPI and Unified Parallel C.

    SciTech Connect

    Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.

    2010-01-01

    The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.

  6. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  7. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-03-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

  8. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-12-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.

  9. Medipix2 parallel readout system

    NASA Astrophysics Data System (ADS)

    Fanti, V.; Marzeddu, R.; Randaccio, P.

    2003-08-01

    A fast parallel readout system based on a PCI board has been developed in the framework of the Medipix collaboration. The readout electronics consists of two boards: the motherboard directly interfacing the Medipix2 chip, and the PCI board with digital I/O ports 32 bits wide. The device driver and readout software have been developed at low level in Assembler to allow fast data transfer and image reconstruction. The parallel readout permits a transfer rate up to 64 Mbytes/s. http://medipix.web.cern ch/MEDIPIX/

  10. Parallelization of the SIR code

    NASA Astrophysics Data System (ADS)

    Thonhofer, S.; Bellot Rubio, L. R.; Utz, D.; Jurčak, J.; Hanslmeier, A.; Piantschitsch, I.; Pauritsch, J.; Lemmerer, B.; Guttenbrunner, S.

    A high-resolution 3-dimensional model of the photospheric magnetic field is essential for the investigation of small-scale solar magnetic phenomena. The SIR code is an advanced Stokes-inversion code that deduces physical quantities, e.g. magnetic field vector, temperature, and LOS velocity, from spectropolarimetric data. We extended this code by the capability of directly using large data sets and inverting the pixels in parallel. Due to this parallelization it is now feasible to apply the code directly on extensive data sets. Besides, we included the possibility to use different initial model atmospheres for the inversion, which enhances the quality of the results.

  11. The Complexity of Parallel Algorithms,

    DTIC Science & Technology

    1985-11-01

    Much of this work was done in collaboration with my advisor, Ernst Mayr . He was also supported in part by ONR contract N00014-85-C-0731. F ’. Table...Helinbold and Mayr in their algorithn to compute an optimal two processor schedule [HM2]. One of the promising developments in parallel algorithms is that...lei can be solved by it fast parallel algorithmmmi if the nmlmmmibers are smiall. llehmibold and Mayr JIlM I] have slhowm that. if Ole job timies are

  12. Fast parallel algorithm for CT image reconstruction.

    PubMed

    Flores, Liubov A; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo

    2012-01-01

    In X-ray computed tomography (CT) the X rays are used to obtain the projection data needed to generate an image of the inside of an object. The image can be generated with different techniques. Iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions and from a small number of projections. Their use may be important in portable scanners for their functionality in emergency situations. However, in practice, these methods are not widely used due to the high computational cost of their implementation. In this work we analyze iterative parallel image reconstruction with the Portable Extensive Toolkit for Scientific computation (PETSc).

  13. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  14. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  15. Quality of life in patients with functional dyspepsia: Short- and long-term effect of Helicobacter pylori eradication with pantoprazole, amoxicillin, and clarithromycin or cisapride therapy: A prospective, parallel-group study

    PubMed Central

    Buzás, György M.

    2006-01-01

    Background: Quality of life (QOL) is impaired in functional dyspepsia (FD). Little is known about the effects of different therapies on the QOL profile in patients with this condition. Objectives: The aims of this study were to measure baseline QOL in patients with FD and to assess changes in QOL over time associated with Helicobacter pylori eradication and prokinetic treatment. The primary and secondary end points were the improvement in QOL 6 weeks and 1 year after successful eradication of the infection or prokinetic therapy. Methods: This 1-year, single-center, prospective, open-label, controlled, parallel-group trial was conducted at the Department of Gastroenterology, Ferencvdros Health Centre, Budapest, Hungary. The Functional Digestive Disorder Quality of Life (FDDQoL) Questionnaire (MAPI Research Institute, Lyon, France) was translated and validated previously in Hungarian. Male and female subjects aged 20 to 60 years were enrolled and classified as H pylori positive (HP+), H pylori negative (HP-) with FD, or healthy (control group). The HP+ patients received pantoprazole 40 mg BID + amoxicillin 1000 mg BID + clarithromycin 500 mg BID for 7 days, followed by on-demand ranitidine (150–300 mg/d) for 1 year. The HP- patients received the prokinetic cisapride 10 mg TID for 1 month, followed by on-demand cisapride (10–20 mg/d) for 1 year. The FDDQoL questionnaire was completed by all 3 groups on enrollment, at 6 weeks, and 1 year. Results: A total of 101 HP+ patients, 98 HP- patients, and 123 healthy controls were included in the study (185 women, 137 men; mean age, 39.0 ears). The mean (SD) baseline QOL scores were significantly lower in the HP+ group (53.3 [9.6]; 95% CI, 54.4-58.2) and the HP- groups (50.0 [9.8]; 95% CI, 58.0–62.0) compared with that in healthy controls (76.2 [8.7]; 95% CI, 74.6–77.8) (both, P < 0.001). Analysis of the short-term domain scores found that the HP+ group had significantly decreased scores in 6 of 8 domains: daily

  16. Utilizing parallel optimization in computational fluid dynamics

    NASA Astrophysics Data System (ADS)

    Kokkolaras, Michael

    1998-12-01

    General problems of interest in computational fluid dynamics are investigated by means of optimization. Specifically, in the first part of the dissertation, a method of optimal incremental function approximation is developed for the adaptive solution of differential equations. Various concepts and ideas utilized by numerical techniques employed in computational mechanics and artificial neural networks (e.g. function approximation and error minimization, variational principles and weighted residuals, and adaptive grid optimization) are combined to formulate the proposed method. The basis functions and associated coefficients of a series expansion, representing the solution, are optimally selected by a parallel direct search technique at each step of the algorithm according to appropriate criteria; the solution is built sequentially. In this manner, the proposed method is adaptive in nature, although a grid is neither built nor adapted in the traditional sense using a-posteriori error estimates. Variational principles are utilized for the definition of the objective function to be extremized in the associated optimization problems, ensuring that the problem is well-posed. Complicated data structures and expensive remeshing algorithms and systems solvers are avoided. Computational efficiency is increased by using low-order basis functions and concurrent computing. Numerical results and convergence rates are reported for a range of steady-state problems, including linear and nonlinear differential equations associated with general boundary conditions, and illustrate the potential of the proposed method. Fluid dynamics applications are emphasized. Conclusions are drawn by discussing the method's limitations, advantages, and possible extensions. The second part of the dissertation is concerned with the optimization of the viscous-inviscid-interaction (VII) mechanism in an airfoil flow analysis code. The VII mechanism is based on the concept of a transpiration velocity

  17. Tutorial: Parallel Simulation on Supercomputers

    SciTech Connect

    Perumalla, Kalyan S

    2012-01-01

    This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.

  18. Sequential and Parallel Matrix Computations.

    DTIC Science & Technology

    1984-10-01

    value decomposition and learnt square solutions, Numer. Math. 14 (1970), 403-420. 22o J. Greer and A. Sameh , On certain parallel Toeplitz linear system...Zur Stabilitatsfrag bei Matrizen-EigenweCe-Problemn, Z. Angun. Hath. Phys. (1956). 473-500. 36. D. L. Slotnick and A. H. Sameh , Numerical calculation

  19. Parallel Algorithms for PDE Solvers

    DTIC Science & Technology

    1988-07-15

    This report lists all of the 39 scientific publications , these, technical reports and conference presentations supported by the grant AFOSR 84-0385. The principal focus of the results are in 1) The Collocation Method: New versions developed for parallel machines, new results on the convergence and new

  20. Fast, Massively Parallel Data Processors

    NASA Technical Reports Server (NTRS)

    Heaton, Robert A.; Blevins, Donald W.; Davis, ED

    1994-01-01

    Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.

  1. Optical Interferometric Parallel Data Processor

    NASA Technical Reports Server (NTRS)

    Breckinridge, J. B.

    1987-01-01

    Image data processed faster than in present electronic systems. Optical parallel-processing system effectively calculates two-dimensional Fourier transforms in time required by light to travel from plane 1 to plane 8. Coherence interferometer at plane 4 splits light into parts that form double image at plane 6 if projection screen placed there.

  2. [Falsified medicines in parallel trade].

    PubMed

    Muckenfuß, Heide

    2017-09-13

    The number of falsified medicines on the German market has distinctly increased over the past few years. In particular, stolen pharmaceutical products, a form of falsified medicines, have increasingly been introduced into the legal supply chain via parallel trading. The reasons why parallel trading serves as a gateway for falsified medicines are most likely the complex supply chains and routes of transport. It is hardly possible for national authorities to trace the history of a medicinal product that was bought and sold by several intermediaries in different EU member states. In addition, the heterogeneous outward appearance of imported and relabelled pharmaceutical products facilitates the introduction of illegal products onto the market. Official batch release at the Paul-Ehrlich-Institut offers the possibility of checking some aspects that might provide an indication of a falsified medicine. In some circumstances, this may allow the identification of falsified medicines before they come onto the German market. However, this control is only possible for biomedicinal products that have not received a waiver regarding official batch release. For improved control of parallel trade, better networking among the EU member states would be beneficial. European-wide regulations, e. g., for disclosure of the complete supply chain, would help to minimise the risks of parallel trading and hinder the marketing of falsified medicines.

  3. Parallel distributed computing using Python

    NASA Astrophysics Data System (ADS)

    Dalcin, Lisandro D.; Paz, Rodrigo R.; Kler, Pablo A.; Cosimo, Alejandro

    2011-09-01

    This work presents two software components aimed to relieve the costs of accessing high-performance parallel computing resources within a Python programming environment: MPI for Python and PETSc for Python. MPI for Python is a general-purpose Python package that provides bindings for the Message Passing Interface (MPI) standard using any back-end MPI implementation. Its facilities allow parallel Python programs to easily exploit multiple processors using the message passing paradigm. PETSc for Python provides access to the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Its facilities allow sequential and parallel Python applications to exploit state of the art algorithms and data structures readily available in PETSc for the solution of large-scale problems in science and engineering. MPI for Python and PETSc for Python are fully integrated to PETSc-FEM, an MPI and PETSc based parallel, multiphysics, finite elements code developed at CIMEC laboratory. This software infrastructure supports research activities related to simulation of fluid flows with applications ranging from the design of microfluidic devices for biochemical analysis to modeling of large-scale stream/aquifer interactions.

  4. Parallel coprocessors speed graphics system

    SciTech Connect

    Mcewan, C.

    1983-05-26

    Up to five parallel coprocessors, a pipelined architecture and display-list data structures combine to create Ramtek Corporation's fast, modular/raster graphics system, which is upgradable with software. It is stated that the system meets the needs of most CAD/CAM and simulation graphics applications. A 32-bit Vmebus structure is used.

  5. Parallel, Distributed Scripting with Python

    SciTech Connect

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.

  6. File concepts for parallel I/O

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1989-01-01

    The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

  7. Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

    PubMed

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-06-01

    We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.

  8. Parallelizing AT with MatlabMPI

    SciTech Connect

    Li, Evan Y.; /Brown U. /SLAC

    2011-06-22

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  9. Methods of parallel computation applied on granular simulations

    NASA Astrophysics Data System (ADS)

    Martins, Gustavo H. B.; Atman, Allbens P. F.

    2017-06-01

    Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.

  10. Parallel global optimization with the particle swarm algorithm.

    PubMed

    Schutte, J F; Reinbolt, J A; Fregly, B J; Haftka, R T; George, A D

    2004-12-07

    Present day engineering optimization problems often impose large computational demands, resulting in long solution times even on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained parallelization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima-large-scale analytical test problems with computationally cheap function evaluations and medium-scale biomechanical system identification problems with computationally expensive function evaluations. For load-balanced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In contrast, for load-imbalanced biomechanical system identification problems with 12 design variables, speedup plateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the synchronization requirement of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical problems were solved using a fixed number of swarm iterations, a single population of 128 particles produced a better convergence rate than did multiple independent runs performed using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 particles). These results suggest that (1) parallel PSO exhibits excellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-life problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available.

  11. Broadband monitoring simulation with massively parallel processors

    NASA Astrophysics Data System (ADS)

    Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

    2011-09-01

    Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

  12. Parallel payers, privatization and two-tier healthcare in Canada.

    PubMed

    Davidson, Alan

    2008-01-01

    The commissioning of care by Workers' Compensation Boards alongside provincial healthcare insurance plans functions in Canada in much the same way as parallel private insurance functions in countries like England and Australia. Parallel payers introduce policy conflict, undermine equity and promote privatization. WCB demands for expedited care for injured workers create challenges for the efficiency and fairness of the healthcare system. Unfortunately, the legitimate policy of goals of WCB and universal healthcare insurance are difficult to reconcile in the real world of Canadian healthcare policy.

  13. Predicting mining activity with parallel genetic algorithms

    USGS Publications Warehouse

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  14. Parallel supercomputing with commodity components

    NASA Technical Reports Server (NTRS)

    Warren, M. S.; Goda, M. P.; Becker, D. J.

    1997-01-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  15. Parallel supercomputing with commodity components

    SciTech Connect

    Warren, M.S.; Goda, M.P.; Becker, D.J.

    1997-09-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  16. Parallel multiplex laser feedback interferometry

    SciTech Connect

    Zhang, Song; Tan, Yidong; Zhang, Shulian

    2013-12-15

    We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.

  17. A generalized parallel replica dynamics

    SciTech Connect

    Binder, Andrew; Lelièvre, Tony; Simpson, Gideon

    2015-03-01

    Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming–Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.

  18. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  19. ASP: a parallel computing technology

    NASA Astrophysics Data System (ADS)

    Lea, R. M.

    1990-09-01

    ASP modules constitute the basis of a parallel computing technology platform for the rapid development of a broad range of numeric and symbolic information processing systems. Based on off-the-shelf general-purpose hardware and software modules ASP technology is intended to increase productivity in the development (and competitiveness in the marketing) of cost-effective low-MIMD/high-SIMD Massively Parallel Processor (MPPs). The paper discusses ASP module philosophy and demonstrates how ASP modules can satisfy the market algorithmic architectural and engineering requirements of such MPPs. In particular two specific ASP modules based on VLSI and WSI technologies are studied as case examples of ASP technology the latter reporting 1 TOPS/fl3 1 GOPS/W and 1 MOPS/$ as ball-park figures-of-merit of cost-effectiveness.

  20. Parallel processing spacecraft communication system

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

    1998-01-01

    An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

  1. A generalized parallel replica dynamics

    NASA Astrophysics Data System (ADS)

    Binder, Andrew; Lelièvre, Tony; Simpson, Gideon

    2015-03-01

    Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming-Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.

  2. Parallel supercomputing with commodity components

    NASA Technical Reports Server (NTRS)

    Warren, M. S.; Goda, M. P.; Becker, D. J.

    1997-01-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  3. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  4. High performance parallel implicit CFD.

    SciTech Connect

    Gropp, W. D.; Kaushik, D. K.; Keyes, D. E.; Smith, B. F.; Mathematics and Computer Science; Old Dominion Univ.

    2001-03-01

    Fluid dynamical simulations based on finite discretizations on (quasi-)static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDE-based codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.

  5. Task parallelism and high-performance languages

    SciTech Connect

    Foster, I.

    1996-03-01

    The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.

  6. Supporting data intensive applications with medium grained parallelism

    SciTech Connect

    Pfaltz, J.L.; French, J.C.; Grimshaw, A.S.; Son, S.H.

    1992-04-01

    ADAMS is an ambitious effort to provide new database access paradigms for the kinds of scientific applications that require massively parallel access to very large data sets in order to be effective. Many of the Grand Challenge Problems fall into this category, as well as those kinds of scientific research which depend on widely distributed shared sets of disparate data. The essence of the ADAMS approach is to view data purely in functional terms, rather than the more traditional structural view in which multiple data items are aggregated into records or tuples of flat files. Further, ADAMS has been implemented as an embedded interface so that scientists can develop applications in the host programming language of their choice, often Fortran, Pascal, or C, and still access shared data generated in other environments. The syntax and semantics of ADAMS is essentially complete. The functional nature of the ADAMS data interface paradigm simplifies its implementation in a distributed environment, e.g., the Mentat run-time system, because one must only distribute functional servers, not pieces of data structures. However, this only opens up the possibility of effective parallel database processing; to realize this potential far more work must be done in the areas of data dependence, intra-statement parallelism, parallel query optimization, and maintaining consistency and reliability in concurrent systems. Discovering how to make effective parallel data access an actually in real scientific applications is the point of this research.

  7. Parallel Symmetric Eigenvalue Problem Solvers

    DTIC Science & Technology

    2015-05-01

    Park, NC 27709-2211 Trace minimization, saddle-point problems, lowest eigenpairs, sampling the spectrum REPORT DOCUMENTATION PAGE 11. SPONSOR...eigenpairs or seeking a large number of eigenpairs in any interval of the spectrum . Numerical experiments demonstrate clearly that Trace Minimization is a...the Fiedler vector . . . . . . . . . . . . . . . . . 59 6.1.5 Computing interior eigenpairs via spectrum folding . . . . . 60 6.1.6 My parallel

  8. CSRD Parallel Service Machine Enhancement

    DTIC Science & Technology

    1989-11-30

    problems by C. Kamath and A. Sameh in [Kama86]. Several theoretical and numerical results have been obtained for RP methods in the last year. Among them...145-163, 19(1979). [BrSa88] R. Bramley and A. Sameh , A Robust Parallel Solver for Block Tridiagonal Systems, CSRD Technical Report #806, Center for...Supercomputing Research and Development, University of Illinois - Urbana, 1988. [BrSa89a] R. Bramley and A. Sameh , Row Projection Methods for Large

  9. National Combustion Code: Parallel Performance

    NASA Technical Reports Server (NTRS)

    Babrauckas, Theresa

    2001-01-01

    This report discusses the National Combustion Code (NCC). The NCC is an integrated system of codes for the design and analysis of combustion systems. The advanced features of the NCC meet designers' requirements for model accuracy and turn-around time. The fundamental features at the inception of the NCC were parallel processing and unstructured mesh. The design and performance of the NCC are discussed.

  10. Parallelism in Manipulator Dynamics. Revision.

    DTIC Science & Technology

    1983-12-01

    excessive, and a VLSI implementation architecutre is suggested. We indicate possible appli- cations to incorporating dynamical considerations into...Inverse Dynamics problem. It investigates the high degree of parallelism inherent in the computations , and presents two "mathematically exact" formulations...and a 3 b Cases ............. ... 109 5 .9-- i 0. OVERVIEW The Inverse Dynamics problem consists (loosely) of computing the motor torques necessary to

  11. Lightweight Specifications for Parallel Correctness

    DTIC Science & Technology

    2012-12-05

    this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204... George Necula Professor David Wessel Fall 2012 1 Abstract Lightweight Specifications for Parallel Correctness by Jacob Samuels Burnim Doctor of Philosophy...enthusiasm and endless flow of ideas, and for his keen research sense. I would also like to thank George Necula for chairing my qualifying exam committee and

  12. Parallel strategies for SAR processing

    NASA Astrophysics Data System (ADS)

    Segoviano, Jesus A.

    2004-12-01

    This article proposes a series of strategies for improving the computer process of the Synthetic Aperture Radar (SAR) signal treatment, following the three usual lines of action to speed up the execution of any computer program. On the one hand, it is studied the optimization of both, the data structures and the application architecture used on it. On the other hand it is considered a hardware improvement. For the former, they are studied both, the usually employed SAR process data structures, proposing the use of parallel ones and the way the parallelization of the algorithms employed on the process is implemented. Besides, the parallel application architecture classifies processes between fine/coarse grain. These are assigned to individual processors or separated in a division among processors, all of them in their corresponding architectures. For the latter, it is studied the hardware employed on the computer parallel process used in the SAR handling. The improvement here refers to several kinds of platforms in which the SAR process is implemented, shared memory multicomputers, and distributed memory multiprocessors. A comparison between them gives us some guidelines to follow in order to get a maximum throughput with a minimum latency and a maximum effectiveness with a minimum cost, all together with a limited complexness. It is concluded and described, that the approach consisting of the processing of the algorithms in a GNU/Linux environment, together with a Beowulf cluster platform offers, under certain conditions, the best compromise between performance and cost, and promises the major development in the future for the Synthetic Aperture Radar computer power thirsty applications in the next years.

  13. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  14. Highly parallel sparse Cholesky factorization

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

  15. Parallel processing of genomics data

    NASA Astrophysics Data System (ADS)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  16. Parallel Markov chain Monte Carlo simulations

    NASA Astrophysics Data System (ADS)

    Ren, Ruichao; Orkoulas, G.

    2007-06-01

    With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.

  17. Parallel Markov chain Monte Carlo simulations.

    PubMed

    Ren, Ruichao; Orkoulas, G

    2007-06-07

    With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.

  18. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  19. Computational efficiency of parallel combinatorial OR-tree searches

    NASA Technical Reports Server (NTRS)

    Li, Guo-Jie; Wah, Benjamin W.

    1990-01-01

    The performance of parallel combinatorial OR-tree searches is analytically evaluated. This performance depends on the complexity of the problem to be solved, the error allowance function, the dominance relation, and the search strategies. The exact performance may be difficult to predict due to the nondeterminism and anomalies of parallelism. The authors derive the performance bounds of parallel OR-tree searches with respect to the best-first, depth-first, and breadth-first strategies, and verify these bounds by simulation. They show that a near-linear speedup can be achieved with respect to a large number of processors for parallel OR-tree searches. Using the bounds developed, the authors derive sufficient conditions for assuring that parallelism will not degrade performance and necessary conditions for allowing parallelism to have a speedup greater than the ratio of the numbers of processors. These bounds and conditions provide the theoretical foundation for determining the number of processors required to assure a near-linear speedup.

  20. The probability of parallel genetic evolution from standing genetic variation.

    PubMed

    MacPherson, A; Nuismer, S L

    2017-02-01

    Parallel evolution is often assumed to result from repeated adaptation to novel, yet ecologically similar, environments. Here, we develop and analyse a mathematical model that predicts the probability of parallel genetic evolution from standing genetic variation as a function of the strength of phenotypic selection and constraints imposed by genetic architecture. Our results show that the probability of parallel genetic evolution increases with the strength of natural selection and effective population size and is particularly likely to occur for genes with large phenotypic effects. Building on these results, we develop a Bayesian framework for estimating the strength of parallel phenotypic selection from genetic data. Using extensive individual-based simulations, we show that our estimator is robust across a wide range of genetic and evolutionary scenarios and provides a useful tool for rigorously testing the hypothesis that parallel genetic evolution is the result of adaptive evolution. An important result that emerges from our analyses is that existing studies of parallel genetic evolution frequently rely on data that is insufficient for distinguishing between adaptive evolution and neutral evolution driven by random genetic drift. Overcoming this challenge will require sampling more populations and the inclusion of larger numbers of loci.

  1. Allinea DDT as a Parallel Debugging Alternative to Totalview

    SciTech Connect

    Antypas, K.B.

    2007-03-05

    Totalview, from the Etnus Corporation, is a sophisticated and feature rich software debugger for parallel applications. As Totalview has gained in popularity and market share its pricing model has increased to the point where it is often prohibitively expensive for massively parallel supercomputers. Additionally, many of Totalview's advanced features are not used by members of the scientific computing community. For these reasons, supercomputing centers have begun to search for a basic parallel debugging tool which can be used as an alternative to Totalview. As the cost and complexity of Totalview has increased over the years, scientific computing centers have started searching for a viable parallel debugging alternative. DDT (Distributed Debugging Tool) from Allinea Software is a relatively new parallel debugging tool which aims to provide much of the same functionality as Totalview. This review outlines the basic features and limitations of DDT to determine if it can be a reasonable substitute for Totalview. DDT was tested on the NERSC platforms Bassi, Seaborg, Jacquard and Davinci with Fortran90, C, and C++ codes using MPI and OpenMP for parallelism.

  2. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  3. Optical flow optimization using parallel genetic algorithm

    NASA Astrophysics Data System (ADS)

    Zavala-Romero, Olmo; Botella, Guillermo; Meyer-Bäse, Anke; Meyer Base, Uwe

    2011-06-01

    A new approach to optimize the parameters of a gradient-based optical flow model using a parallel genetic algorithm (GA) is proposed. The main characteristics of the optical flow algorithm are its bio-inspiration and robustness against contrast, static patterns and noise, besides working consistently with several optical illusions where other algorithms fail. This model depends on many parameters which conform the number of channels, the orientations required, the length and shape of the kernel functions used in the convolution stage, among many more. The GA is used to find a set of parameters which improve the accuracy of the optical flow on inputs where the ground-truth data is available. This set of parameters helps to understand which of them are better suited for each type of inputs and can be used to estimate the parameters of the optical flow algorithm when used with videos that share similar characteristics. The proposed implementation takes into account the embarrassingly parallel nature of the GA and uses the OpenMP Application Programming Interface (API) to speedup the process of estimating an optimal set of parameters. The information obtained in this work can be used to dynamically reconfigure systems, with potential applications in robotics, medical imaging and tracking.

  4. Development of a Parallel Redundant STATCOM System

    NASA Astrophysics Data System (ADS)

    Takeda, Masatoshi; Yasuda, Satoshi; Tamai, Shinzo; Morishima, Naoki

    This paper presents a new concept of parallel redundant STATCOM system. This system consists of a number of medium capacity STATCOM units connected in parallel, which can achieve a high operational reliability and functional flexibility. The proposed STATCOM system has such redundant operation characteristics that the remaining STATCOM units can maintain their operation even though some of the STATCOM units are out of service. And also, it has flexible convertibility so that it can be converted to a BTB or a UPFC system easily, according to the diversified change of needs in power systems. In order to realize this concept, the authors developed several important key technologies for the STATCOM, such as the novel PWM scheme that enables effective cancellation of lower order harmonics, GCT inverter technologies with small loss consumption, and the coordination control scheme with capacitor banks to ensure effective dynamic performance with minimum loss consumption. The proposed STATCOM system was put into practical applications, exhibiting excellent performance characteristics at each site.

  5. The parallel I/O architecture of the High Performance Storage System (HPSS)

    SciTech Connect

    Watson, R.W.; Coyne, R.A.

    1995-02-01

    Rapid improvements in computational science, processing capability, main memory sizes, data collection devices, multimedia capabilities and integration of enterprise data are producing very large datasets (10s-100s of gigabytes to terabytes). This rapid growth of data has resulted in a serious imbalance in I/O and storage system performance and functionality. One promising approach to restoring balanced I/O and storage system performance is use of parallel data transfer techniques for client access to storage, device-to-device transfers, and remote file transfers. This paper describes the parallel I/O architecture and mechanisms, Parallel Transport Protocol, parallel FIP, and parallel client Application Programming Interface (API) used by the High Performance Storage System (HPSS). Parallel storage integration issues with a local parallel file system are also discussed.

  6. Parallel molecular dynamics: Communication requirements for massively parallel machines

    NASA Astrophysics Data System (ADS)

    Taylor, Valerie E.; Stevens, Rick L.; Arnold, Kathryn E.

    1995-05-01

    Molecular mechanics and dynamics are becoming widely used to perform simulations of molecular systems from large-scale computations of materials to the design and modeling of drug compounds. In this paper we address two major issues: a good decomposition method that can take advantage of future massively parallel processing systems for modest-sized problems in the range of 50,000 atoms and the communication requirements needed to achieve 30 to 40% efficiency on MPPs. We analyzed a scalable benchmark molecular dynamics program executing on the Intel Touchstone Deleta parallelized with an interaction decomposition method. Using a validated analytical performance model of the code, we determined that for an MPP with a four-dimensional mesh topology and 400 MHz processors the communication startup time must be at most 30 clock cycles and the network bandwidth must be at least 2.3 GB/s. This configuration results in 30 to 40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors.

  7. Parallelizing alternating direction implicit solver on GPUs

    USDA-ARS?s Scientific Manuscript database

    We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...

  8. Parallel computational fluid dynamics - Implementations and results

    NASA Technical Reports Server (NTRS)

    Simon, Horst D. (Editor)

    1992-01-01

    The present volume on parallel CFD discusses implementations on parallel machines, numerical algorithms for parallel CFD, and performance evaluation and computer science issues. Attention is given to a parallel algorithm for compressible flows through rotor-stator combinations, a massively parallel Euler solver for unstructured grids, a fast scheme to analyze 3D disk airflow on a parallel computer, and a block implicit multigrid solution of the Euler equations. Topics addressed include a 3D ADI algorithm on distributed memory multiprocessors, clustered element-by-element computations for fluid flow, hypercube FFT and the Fourier pseudospectral method, and an investigation of parallel iterative algorithms for CFD. Also discussed are fluid dynamics using interface methods on parallel processors, sorting for particle flow simulation on the connection machine, a large grain mapping method, and efforts toward a Teraflops capability for CFD.

  9. Implementing clips on a parallel computer

    NASA Technical Reports Server (NTRS)

    Riley, Gary

    1987-01-01

    The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.

  10. The interaction of turbulence with parallel and perpendicular shocks

    NASA Astrophysics Data System (ADS)

    Adhikari, L.; Zank, G. P.; Hunana, P.; Hu, Q.

    2016-11-01

    Interplanetary shocks exist in most astrophysical flows, and modify the properties of the background flow. We apply the Zank et al 2012 six coupled turbulence transport model equations to study the interaction of turbulence with parallel and perpendicular shock waves in the solar wind. We model the 1D structure of a stationary perpendicular or parallel shock wave using a hyperbolic tangent function and the Rankine-Hugoniot conditions. A reduced turbulence transport model (the 4-equation model) is applied to parallel and perpendicular shock waves, and solved using a 4th- order Runge Kutta method. We compare the model results with ACE spacecraft observations. We identify one quasi-parallel and one quasi-perpendicular event in the ACE spacecraft data sets, and compute various turbulent observed values such as the fluctuating magnetic and kinetic energy, the energy in forward and backward propagating modes, the total turbulent energy in the upstream and downstream of the shock. We also calculate the error associated with each turbulent observed value, and fit the observed values by a least square method and use a Fourier series fitting function. We find that the theoretical results are in reasonable agreement with observations. The energy in turbulent fluctuations is enhanced and the correlation length is approximately constant at the shock. Similarly, the normalized cross helicity increases across a perpendicular shock, and decreases across a parallel shock.

  11. High Performance Parallel Computational Nanotechnology

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to

  12. High Performance Parallel Computational Nanotechnology

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to

  13. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  14. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  15. Parallel multi-computers and artificial intelligence

    SciTech Connect

    Uhr, L.

    1986-01-01

    This book examines the present state and future direction of multicomputer parallel architectures for artificial intelligence research and development of artificial intelligence applications. The book provides a survey of the large variety of parallel architectures, describing the current state of the art and suggesting promising architectures to produce artificial intelligence systems such as intelligence systems such as intelligent robots. This book integrates artificial intelligence and parallel processing research areas and discusses parallel processing from the viewpoint of artificial intelligence.

  16. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  17. Heart Fibrillation and Parallel Supercomputers

    NASA Technical Reports Server (NTRS)

    Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.

    1997-01-01

    The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.

  18. Scheduling Tasks In Parallel Processing

    NASA Technical Reports Server (NTRS)

    Price, Camille C.; Salama, Moktar A.

    1989-01-01

    Algorithms sought to minimize time and cost of computation. Report describes research on scheduling of computations tasks in system of multiple identical data processors operating in parallel. Computational intractability requires use of suboptimal heuristic algorithms. First algorithm called "list heuristic", variation of classical list scheduling. Second algorithm called "cluster heuristic" applied to tightly coupled tasks and consists of four phases. Third algorithm called "exchange heuristic", iterative-improvement algorithm beginning with initial feasible assignment of tasks to processors and periods of time. Fourth algorithm is iterative one for optimal assignment of tasks and based on concept called "simulated annealing" because of mathematical resemblance to aspects of physical annealing processes.

  19. True Shear Parallel Plate Viscometer

    NASA Technical Reports Server (NTRS)

    Ethridge, Edwin; Kaukler, William

    2010-01-01

    This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.

  20. Scalable Parallel Algebraic Multigrid Solvers

    SciTech Connect

    Bank, R; Lu, S; Tong, C; Vassilevski, P

    2005-03-23

    The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.

  1. Parallel Assembly of LIGA Components

    SciTech Connect

    Christenson, T.R.; Feddema, J.T.

    1999-03-04

    In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.

  2. Parallel BLAST on split databases.

    PubMed

    Mathog, David R

    2003-09-22

    BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output. Source code is available from ftp://saf.bio.caltech.edu/

  3. Heart Fibrillation and Parallel Supercomputers

    NASA Technical Reports Server (NTRS)

    Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.

    1997-01-01

    The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.

  4. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  5. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  6. Reservoir Thermal Recover Simulation on Parallel Computers

    NASA Astrophysics Data System (ADS)

    Li, Baoyan; Ma, Yuanle

    The rapid development of parallel computers has provided a hardware background for massive refine reservoir simulation. However, the lack of parallel reservoir simulation software has blocked the application of parallel computers on reservoir simulation. Although a variety of parallel methods have been studied and applied to black oil, compositional, and chemical model numerical simulations, there has been limited parallel software available for reservoir simulation. Especially, the parallelization study of reservoir thermal recovery simulation has not been fully carried out, because of the complexity of its models and algorithms. The authors make use of the message passing interface (MPI) standard communication library, the domain decomposition method, the block Jacobi iteration algorithm, and the dynamic memory allocation technique to parallelize their serial thermal recovery simulation software NUMSIP, which is being used in petroleum industry in China. The parallel software PNUMSIP was tested on both IBM SP2 and Dawn 1000A distributed-memory parallel computers. The experiment results show that the parallelization of I/O has great effects on the efficiency of parallel software PNUMSIP; the data communication bandwidth is also an important factor, which has an influence on software efficiency. Keywords: domain decomposition method, block Jacobi iteration algorithm, reservoir thermal recovery simulation, distributed-memory parallel computer

  7. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    ERIC Educational Resources Information Center

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  8. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    ERIC Educational Resources Information Center

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  9. Coordination in serial-parallel image processing

    NASA Astrophysics Data System (ADS)

    Wójcik, Waldemar; Dubovoi, Vladymyr M.; Duda, Marina E.; Romaniuk, Ryszard S.; Yesmakhanova, Laura; Kozbakova, Ainur

    2015-12-01

    Serial-parallel systems used to convert the image. The control of their work results with the need to solve coordination problem. The paper summarizes the model of coordination of resource allocation in relation to the task of synchronizing parallel processes; the genetic algorithm of coordination developed, its adequacy verified in relation to the process of parallel image processing.

  10. Detection of multiple sinusoids using a parallel ale

    SciTech Connect

    David, R.A.

    1984-01-01

    This paper introduces an Adaptive Line Enhancer (ALE) whose parallel structure enables the detection and enhancement of multiple sinusoids. A function describing the performance surface is derived for the case where several line signals are buried in white noise. A steepest descent adaptive algorithm is derived, and simulations are used to demonstrate its performance.

  11. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  12. A massively asynchronous, parallel brain

    PubMed Central

    Zeki, Semir

    2015-01-01

    Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871

  13. Parallel job-scheduling algorithms

    SciTech Connect

    Rodger, S.H.

    1989-01-01

    In this thesis, we consider solving job scheduling problems on the CREW PRAM model. We show how to adapt Cole's pipeline merge technique to yield several efficient parallel algorithms for a number of job scheduling problems and one optimal parallel algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and processing times, find a schedule that minimizes the maximum lateness of the jobs and allows preemption when the jobs are scheduled to run on one machine. In addition, we present the first NC algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and unit processing times, determine if there is a schedule of jobs on one machine, and calculate the schedule if it exists. We identify the notion of a canonical schedule, which is the type of schedule our algorithm computes if there is a schedule. Our algorithm runs in O((log n){sup 2}) time and uses O(n{sup 2}k{sup 2}) processors, where k is the minimum number of distinct offsets of release times or deadlines.

  14. Implementation and performance of parallelized elegant.

    SciTech Connect

    Wang, Y.; Borland, M.; Accelerator Systems Division

    2008-01-01

    The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.

  15. Aligning multiple protein sequences by parallel hybrid genetic algorithm.

    PubMed

    Nguyen, Hung Dinh; Yoshihara, Ikuo; Yamamori, Kunihito; Yasunaga, Moritoshi

    2002-01-01

    This paper presents a parallel hybrid genetic algorithm (GA) for solving the sum-of-pairs multiple protein sequence alignment. A new chromosome representation and its corresponding genetic operators are proposed. A multi-population GENITOR-type GA is combined with local search heuristics. It is then extended to run in parallel on a multiprocessor system for speeding up. Experimental results of benchmarks from the BAliBASE show that the proposed method is superior to MSA, OMA, and SAGA methods with regard to quality of solution and running time. It can be used for finding multiple sequence alignment as well as testing cost functions.

  16. Analysis of the Rotopod: An all revolute parallel manipulator

    SciTech Connect

    Schmitt, D.J.; Benavides, G.L.; Bieg, L.F.; Kozlowski, D.M.

    1998-05-16

    This paper introduces a new configuration of parallel manipulator call the Rotopod which is constructed from all revolute type joints. The Rotopod consists of two platforms connected by six legs and exhibits six Cartesian degrees of freedom. The Rotopod is initially compared with other all revolute joint parallel manipulators to show its similarities and differences. The inverse kinematics for this mechanism are developed and used to analyze the accessible workspace of the mechanism. Optimization is performed to determine the Rotopod design configurations which maximum the accessible workspace based on desirable functional constraints.

  17. Development of massively parallel quantum chemistry program SMASH

    SciTech Connect

    Ishimura, Kazuya

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  18. Development of massively parallel quantum chemistry program SMASH

    NASA Astrophysics Data System (ADS)

    Ishimura, Kazuya

    2015-12-01

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C150H30)2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  19. Parallel Monte Carlo Simulation for control system design

    NASA Technical Reports Server (NTRS)

    Schubert, Wolfgang M.

    1995-01-01

    The research during the 1993/94 academic year addressed the design of parallel algorithms for stochastic robustness synthesis (SRS). SRS uses Monte Carlo simulation to compute probabilities of system instability and other design-metric violations. The probabilities form a cost function which is used by a genetic algorithm (GA). The GA searches for the stochastic optimal controller. The existing sequential algorithm was analyzed and modified to execute in a distributed environment. For this, parallel approaches to Monte Carlo simulation and genetic algorithms were investigated. Initial empirical results are available for the KSR1.

  20. Local and nonlocal parallel heat transport in general magnetic fields

    SciTech Connect

    Del-Castillo-Negrete, Diego B; Chacon, Luis

    2011-01-01

    A novel approach for the study of parallel transport in magnetized plasmas is presented. The method avoids numerical pollution issues of grid-based formulations and applies to integrable and chaotic magnetic fields with local or nonlocal parallel closures. In weakly chaotic fields, the method gives the fractal structure of the devil's staircase radial temperature profile. In fully chaotic fields, the temperature exhibits self-similar spatiotemporal evolution with a stretched-exponential scaling function for local closures and an algebraically decaying one for nonlocal closures. It is shown that, for both closures, the effective radial heat transport is incompatible with the quasilinear diffusion model.