Science.gov

Sample records for not4p functions parallel

  1. Functional MRI using regularized parallel imaging acquisition.

    PubMed

    Lin, Fa-Hsuan; Huang, Teng-Yi; Chen, Nan-Kuei; Wang, Fu-Nien; Stufflebeam, Steven M; Belliveau, John W; Wald, Lawrence L; Kwong, Kenneth K

    2005-08-01

    Parallel MRI techniques reconstruct full-FOV images from undersampled k-space data by using the uncorrelated information from RF array coil elements. One disadvantage of parallel MRI is that the image signal-to-noise ratio (SNR) is degraded because of the reduced data samples and the spatially correlated nature of multiple RF receivers. Regularization has been proposed to mitigate the SNR loss originating due to the latter reason. Since it is necessary to utilize static prior to regularization, the dynamic contrast-to-noise ratio (CNR) in parallel MRI will be affected. In this paper we investigate the CNR of regularized sensitivity encoding (SENSE) acquisitions. We propose to implement regularized parallel MRI acquisitions in functional MRI (fMRI) experiments by incorporating the prior from combined segmented echo-planar imaging (EPI) acquisition into SENSE reconstructions. We investigated the impact of regularization on the CNR by performing parametric simulations at various BOLD contrasts, acceleration rates, and sizes of the active brain areas. As quantified by receiver operating characteristic (ROC) analysis, the simulations suggest that the detection power of SENSE fMRI can be improved by regularized reconstructions, compared to unregularized reconstructions. Human motor and visual fMRI data acquired at different field strengths and array coils also demonstrate that regularized SENSE improves the detection of functionally active brain regions.

  2. Functional MRI Using Regularized Parallel Imaging Acquisition

    PubMed Central

    Lin, Fa-Hsuan; Huang, Teng-Yi; Chen, Nan-Kuei; Wang, Fu-Nien; Stufflebeam, Steven M.; Belliveau, John W.; Wald, Lawrence L.; Kwong, Kenneth K.

    2013-01-01

    Parallel MRI techniques reconstruct full-FOV images from undersampled k-space data by using the uncorrelated information from RF array coil elements. One disadvantage of parallel MRI is that the image signal-to-noise ratio (SNR) is degraded because of the reduced data samples and the spatially correlated nature of multiple RF receivers. Regularization has been proposed to mitigate the SNR loss originating due to the latter reason. Since it is necessary to utilize static prior to regularization, the dynamic contrast-to-noise ratio (CNR) in parallel MRI will be affected. In this paper we investigate the CNR of regularized sensitivity encoding (SENSE) acquisitions. We propose to implement regularized parallel MRI acquisitions in functional MRI (fMRI) experiments by incorporating the prior from combined segmented echo-planar imaging (EPI) acquisition into SENSE reconstructions. We investigated the impact of regularization on the CNR by performing parametric simulations at various BOLD contrasts, acceleration rates, and sizes of the active brain areas. As quantified by receiver operating characteristic (ROC) analysis, the simulations suggest that the detection power of SENSE fMRI can be improved by regularized reconstructions, compared to unregularized reconstructions. Human motor and visual fMRI data acquired at different field strengths and array coils also demonstrate that regularized SENSE improves the detection of functionally active brain regions. PMID:16032694

  3. Massively parallel density functional calculations for thousands of atoms: KKRnano

    NASA Astrophysics Data System (ADS)

    Thiess, A.; Zeller, R.; Bolten, M.; Dederichs, P. H.; Blügel, S.

    2012-06-01

    Applications of existing precise electronic-structure methods based on density functional theory are typically limited to the treatment of about 1000 inequivalent atoms, which leaves unresolved many open questions in material science, e.g., on complex defects, interfaces, dislocations, and nanostructures. KKRnano is a new massively parallel linear scaling all-electron density functional algorithm in the framework of the Korringa-Kohn-Rostoker (KKR) Green's-function method. We conceptualized, developed, and optimized KKRnano for large-scale applications of many thousands of atoms without compromising on the precision of a full-potential all-electron method, i.e., it is a method without any shape approximation of the charge density or potential. A key element of the new method is the iterative solution of the sparse linear Dyson equation, which we parallelized atom by atom, across energy points in the complex plane and for each spin degree of freedom using the message passing interface standard, followed by a lower-level OpenMP parallelization. This hybrid four-level parallelization allows for an efficient use of up to 100000 processors on the latest generation of supercomputers. The iterative solution of the Dyson equation is significantly accelerated, employing preconditioning techniques making use of coarse-graining principles expressed in a block-circulant preconditioner. In this paper, we will describe the important elements of this new algorithm, focusing on the parallelization and preconditioning and showing scaling results for NiPd alloys up to 8192 atoms and 65536 processors. At the end, we present an order-N algorithm for large-scale simulations of metallic systems, making use of the nearsighted principle of the KKR Green's-function approach by introducing a truncation of the electron scattering to a local cluster of atoms, the size of which is determined by the requested accuracy. By exploiting this algorithm, we show linear scaling calculations of more

  4. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  5. Administering truncated receive functions in a parallel messaging interface

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.

  6. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    SciTech Connect

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  7. Functional specialization of parallel motion detection circuits in the fly.

    PubMed

    Joesch, Maximilian; Weber, Franz; Eichner, Hubert; Borst, Alexander

    2013-01-16

    In the fly Drosophila melanogaster, photoreceptor input to motion vision is split into two parallel pathways as represented by first-order interneurons L1 and L2 (Rister et al., 2007; Joesch et al., 2010). However, how these pathways are functionally specialized remains controversial. One study (Eichner et al., 2011) proposed that the L1-pathway evaluates only sequences of brightness increments (ON-ON), while the L2-pathway processes exclusively brightness decrements (OFF-OFF). Another study (Clark et al., 2011) proposed that each of the two pathways evaluates both ON-ON and OFF-OFF sequences. To decide between these alternatives, we recorded from motion-sensitive neurons in flies in which the output from either L1 or L2 was genetically blocked. We found that blocking L1 abolishes ON-ON responses but leaves OFF-OFF responses intact. The opposite was true, when the output from L2 was blocked. We conclude that the L1 and L2 pathways are functionally specialized to detect ON-ON and OFF-OFF sequences, respectively.

  8. Parallel functional programming in Sisal: Fictions, facts, and future

    SciTech Connect

    McGraw, J.R.

    1993-07-01

    This paper provides a status report on the progress of research and development on the functional language Sisal. This project focuses on providing a highly effective method of writing large scientific applications that can efficiently execute on a spectrum of different multiprocessors. The paper includes sections on the language definition, compilation strategies, and programming techniques intended for readers with little or no background with Sisal. The section on performance presents our most recent results on execution speed for shared-memory multiprocessors, our findings using Sisal to develop codes, and our experiences migrating the same source code to different machines. For large programs, the execution performance of Sisal (with minimal supporting advice from the programmer) usually exceeds that of the best available automatic, vector/parallel Fortran compilers. Our evidence also indicates that Sisal programs tend to be shorter in length, faster to write, and dearer to understand than equivalent algorithms in Fortran. The paper concludes with a substantial discussion of common criticisms of the language and our plans for addressing them. Most notably, efficient implementations for distributed memory machines are lacking; an issue we plan to remedy.

  9. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  10. Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.

    PubMed

    Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E

    2017-02-14

    Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.

  11. Efficient time-dependent density functional theory approximations for hybrid density functionals: Analytical gradients and parallelization

    NASA Astrophysics Data System (ADS)

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-01

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ˜26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ˜27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ˜24 on 30 processors. The

  12. Efficient time-dependent density functional theory approximations for hybrid density functionals: analytical gradients and parallelization.

    PubMed

    Petrenko, Taras; Kossmann, Simone; Neese, Frank

    2011-02-07

    In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ~26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ~27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ~24 on 30 processors. The

  13. PDoublePop: An implementation of parallel genetic algorithm for function optimization

    NASA Astrophysics Data System (ADS)

    Tsoulos, Ioannis G.; Tzallas, Alexandros; Tsalikakis, Dimitris

    2016-12-01

    A software for the implementation of parallel genetic algorithms is presented in this article. The underlying genetic algorithm is aimed to locate the global minimum of a multidimensional function inside a rectangular hyperbox. The proposed software named PDoublePop implements a client-server model for parallel genetic algorithms with advanced features for the local genetic algorithms such as: an enhanced stopping rule, an advanced mutation scheme and periodical application of a local search procedure. The user may code the objective function either in C++ or in Fortran77. The method is tested on a series of well-known test functions and the results are reported.

  14. GPU-based parallel group ICA for functional magnetic resonance data.

    PubMed

    Jing, Yanshan; Zeng, Weiming; Wang, Nizhuan; Ren, Tianlong; Shi, Yingchao; Yin, Jun; Xu, Qi

    2015-04-01

    The goal of our study is to develop a fast parallel implementation of group independent component analysis (ICA) for functional magnetic resonance imaging (fMRI) data using graphics processing units (GPU). Though ICA has become a standard method to identify brain functional connectivity of the fMRI data, it is computationally intensive, especially has a huge cost for the group data analysis. GPU with higher parallel computation power and lower cost are used for general purpose computing, which could contribute to fMRI data analysis significantly. In this study, a parallel group ICA (PGICA) on GPU, mainly consisting of GPU-based PCA using SVD and Infomax-ICA, is presented. In comparison to the serial group ICA, the proposed method demonstrated both significant speedup with 6-11 times and comparable accuracy of functional networks in our experiments. This proposed method is expected to perform the real-time post-processing for fMRI data analysis.

  15. Parallelization of the polarizable embedding scheme for higher-order response functions

    NASA Astrophysics Data System (ADS)

    Hykkerud Steindal, Arnfinn; Magnus Haugaard Olsen, Jógvan; Frediani, Luca; Kongsted, Jacob; Ruud, Kenneth

    2012-10-01

    We present a parallel implementation of the Polarizable Embedding (PE) method, an advanced quantum mechanics/molecular mechanics (QM/MM) approach, for Hartree-Fock (PE-HF) and density functional theory (PE-DFT). The parallelization includes calculations of energies and linear, quadratic, and cubic response functions. The couplings to the QM system due to the polarizable embedding potential have been implemented using a master/slave approach. The implementation shows good scaling behaviour, demonstrated through calculations on a small (a water molecule in a bulk of water molecules) and a larger system (Green Fluorescent Protein (GFP)).

  16. Method, systems, and computer program products for implementing function-parallel network firewall

    DOEpatents

    Fulp, Errin W [Winston-Salem, NC; Farley, Ryan J [Winston-Salem, NC

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  17. Parallel sites implicate functional convergence of the hearing gene prestin among echolocating mammals.

    PubMed

    Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng

    2014-09-01

    Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation.

  18. Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities.

    PubMed

    Jing, Gongchao; Sun, Zheng; Wang, Honglei; Gong, Yanhai; Huang, Shi; Ning, Kang; Xu, Jian; Su, Xiaoquan

    2017-01-12

    The number of metagenomes is increasing rapidly. However, current methods for metagenomic analysis are limited by their capability for in-depth data mining among a large number of microbiome each of which carries a complex community structure. Moreover, the complexity of configuring and operating computational pipeline also hinders efficient data processing for the end users. In this work we introduce Parallel-META 3, a comprehensive and fully automatic computational toolkit for rapid data mining among metagenomic datasets, with advanced features including 16S rRNA extraction for shotgun sequences, 16S rRNA copy number calibration, 16S rRNA based functional prediction, diversity statistics, bio-marker selection, interaction network construction, vector-graph-based visualization and parallel computing. Application of Parallel-META 3 on 5,337 samples with 1,117,555,208 sequences from diverse studies and platforms showed it could produce similar results as QIIME and PICRUSt with much faster speed and lower memory usage, which demonstrates its ability to unravel the taxonomical and functional dynamics patterns across large datasets and elucidate ecological links between microbiome and the environment. Parallel-META 3 is implemented in C/C++ and R, and integrated into an executive package for rapid installation and easy access under Linux and Mac OS X. Both binary and source code packages are available at http://bioinfo.single-cell.cn/parallel-meta.html.

  19. Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities

    PubMed Central

    Jing, Gongchao; Sun, Zheng; Wang, Honglei; Gong, Yanhai; Huang, Shi; Ning, Kang; Xu, Jian; Su, Xiaoquan

    2017-01-01

    The number of metagenomes is increasing rapidly. However, current methods for metagenomic analysis are limited by their capability for in-depth data mining among a large number of microbiome each of which carries a complex community structure. Moreover, the complexity of configuring and operating computational pipeline also hinders efficient data processing for the end users. In this work we introduce Parallel-META 3, a comprehensive and fully automatic computational toolkit for rapid data mining among metagenomic datasets, with advanced features including 16S rRNA extraction for shotgun sequences, 16S rRNA copy number calibration, 16S rRNA based functional prediction, diversity statistics, bio-marker selection, interaction network construction, vector-graph-based visualization and parallel computing. Application of Parallel-META 3 on 5,337 samples with 1,117,555,208 sequences from diverse studies and platforms showed it could produce similar results as QIIME and PICRUSt with much faster speed and lower memory usage, which demonstrates its ability to unravel the taxonomical and functional dynamics patterns across large datasets and elucidate ecological links between microbiome and the environment. Parallel-META 3 is implemented in C/C++ and R, and integrated into an executive package for rapid installation and easy access under Linux and Mac OS X. Both binary and source code packages are available at http://bioinfo.single-cell.cn/parallel-meta.html. PMID:28079128

  20. Extending the functionalities of Cartesian grid solvers: Viscous effects modeling and MPI parallelization

    NASA Astrophysics Data System (ADS)

    Marshall, David D.

    With the renewed interest in Cartesian gridding methodologies for the ease and speed of gridding complex geometries in addition to the simplicity of the control volumes used in the computations, it has become important to investigate ways of extending the existing Cartesian grid solver functionalities. This includes developing methods of modeling the viscous effects in order to utilize Cartesian grids solvers for accurate drag predictions and addressing the issues related to the distributed memory parallelization of Cartesian solvers. This research presents advances in two areas of interest in Cartesian grid solvers, viscous effects modeling and MPI parallelization. The development of viscous effects modeling using solely Cartesian grids has been hampered by the widely varying control volume sizes associated with the mesh refinement and the cut cells associated with the solid surface. This problem is being addressed by using physically based modeling techniques to update the state vectors of the cut cells and removing them from the finite volume integration scheme. This work is performed on a new Cartesian grid solver, NASCART-GT, with modifications to its cut cell functionality. The development of MPI parallelization addresses issues associated with utilizing Cartesian solvers on distributed memory parallel environments. This work is performed on an existing Cartesian grid solver, CART3D, with modifications to its parallelization methodology.

  1. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    Charon is a software toolkit that enables engineers to develop high-performing message-passing programs in a convenient and piecemeal fashion. Emphasis is on rapid program development and prototyping. In this report a detailed description of the functional design of the toolkit is presented. It is illustrated by the stepwise parallelization of two representative code examples.

  2. Analysis and selection of optimal function implementations in massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  3. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  4. DGDFT: A massively parallel method for large scale density functional theory calculations.

    PubMed

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  5. Storing files in a parallel computing system based on user-specified parser function

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  6. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    PubMed

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  7. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    NASA Astrophysics Data System (ADS)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  8. Micro/Nanoscale Parallel Patterning of Functional Biomolecules, Organic Fluorophores and Colloidal Nanocrystals

    NASA Astrophysics Data System (ADS)

    Sabella, S.; Brunetti, V.; Vecchio, G.; Torre, A. Della; Rinaldi, R.; Cingolani, R.; Pompa, P. P.

    2009-10-01

    We describe the design and optimization of a reliable strategy that combines self-assembly and lithographic techniques, leading to very precise micro-/nanopositioning of biomolecules for the realization of micro- and nanoarrays of functional DNA and antibodies. Moreover, based on the covalent immobilization of stable and versatile SAMs of programmable chemical reactivity, this approach constitutes a general platform for the parallel site-specific deposition of a wide range of molecules such as organic fluorophores and water-soluble colloidal nanocrystals.

  9. Implementation of linear-scaling plane wave density functional theory on parallel computers

    NASA Astrophysics Data System (ADS)

    Skylaris, Chris-Kriton; Haynes, Peter D.; Mostofi, Arash A.; Payne, Mike C.

    We describe the algorithms we have developed for linear-scaling plane wave density functional calculations on parallel computers as implemented in the onetep program. We outline how onetep achieves plane wave accuracy with a computational cost which increases only linearly with the number of atoms by optimising directly the single-particle density matrix expressed in a psinc basis set. We describe in detail the novel algorithms we have developed for computing with the psinc basis set the quantities needed in the evaluation and optimisation of the total energy within our approach. For our parallel computations we use the general Message Passing Interface (MPI) library of subroutines to exchange data between processors. Accordingly, we have developed efficient schemes for distributing data and computational load to processors in a balanced manner. We describe these schemes in detail and in relation to our algorithms for computations with a psinc basis. Results of tests on different materials show that onetep is an efficient parallel code that should be able to take advantage of a wide range of parallel computer architectures.

  10. Parallel Fixed Point Implementation of a Radial Basis Function Network in an FPGA

    PubMed Central

    de Souza, Alisson C. D.; Fernandes, Marcelo A. C.

    2014-01-01

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. PMID:25268918

  11. A Parallel Independent Component Analysis Approach to Investigate Genomic Influence on Brain Function

    PubMed Central

    Liu, Jingyu; Demirci, Oguz; Calhoun, Vince D.

    2009-01-01

    Relationships between genomic data and functional brain images are of great interest but require new analysis approaches to integrate the high-dimensional data types. This letter presents an extension of a technique called parallel independent component analysis (paraICA), which enables the joint analysis of multiple modalities including interconnections between them. We extend our earlier work by allowing for multiple interconnections and by providing important overfitting controls. Performance was assessed by simulations under different conditions, and indicated reliable results can be extracted by properly balancing overfitting and underfitting. An application to functional magnetic resonance images and single nucleotide polymorphism array produced interesting findings. PMID:19834575

  12. Parallel fixed point implementation of a radial basis function network in an FPGA.

    PubMed

    de Souza, Alisson C D; Fernandes, Marcelo A C

    2014-09-29

    This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained online with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.

  13. Locating and computing in parallel all the simple roots of special functions using PVM

    NASA Astrophysics Data System (ADS)

    Plagianakos, V. P.; Nousis, N. K.; Vrahatis, M. N.

    2001-08-01

    An algorithm is proposed for locating and computing in parallel and with certainty all the simple roots of any twice continuously differentiable function in any specific interval. To compute with certainty all the roots, the proposed method is heavily based on the knowledge of the total number of roots within the given interval. To obtain this information we use results from topological degree theory and, in particular, the Kronecker-Picard approach. This theory gives a formula for the computation of the total number of roots of a system of equations within a given region, which can be computed in parallel. With this tool in hand, we construct a parallel procedure for the localization and isolation of all the roots by dividing the given region successively and applying the above formula to these subregions until the final domains contain at the most one root. The subregions with no roots are discarded, while for the rest a modification of the well-known bisection method is employed for the computation of the contained root. The new aspect of the present contribution is that the computation of the total number of zeros using the Kronecker-Picard integral as well as the localization and computation of all the roots is performed in parallel using the parallel virtual machine (PVM). PVM is an integrated set of software tools and libraries that emulates a general-purpose, flexible, heterogeneous concurrent computing framework on interconnected computers of varied architectures. The proposed algorithm has large granularity and low synchronization, and is robust. It has been implemented and tested and our experience is that it can massively compute with certainty all the roots in a certain interval. Performance information from massive computations related to a recently proposed conjecture due to Elbert (this issue, J. Comput. Appl. Math. 133 (2001) 65-83) is reported.

  14. Parallel functional category deficits in clauses and nominal phrases: The case of English agrammatism

    PubMed Central

    Wang, Honglei; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Individuals with agrammatic aphasia exhibit restricted patterns of impairment of functional morphemes, however, syntactic characterization of the impairment is controversial. Previous studies have focused on functional morphology in clauses only. This study extends the empirical domain by testing functional morphemes in English nominal phrases in aphasia and comparing patients’ impairment to their impairment of functional morphemes in English clauses. In the linguistics literature, it is assumed that clauses and nominal phrases are structurally parallel but exhibit inflectional differences. The results of the present study indicated that aphasic speakers evinced similar impairment patterns in clauses and nominal phrases. These findings are consistent with the Distributed Morphology Hypothesis (DMH), suggesting that the source of functional morphology deficits among agrammatics relates to difficulty implementing rules that convert inflectional features into morphemes. Our findings, however, are inconsistent with the Tree Pruning Hypothesis (TPH), which suggests that patients have difficulty building complex hierarchical structures. PMID:26379370

  15. Parallel functional category deficits in clauses and nominal phrases: The case of English agrammatism.

    PubMed

    Wang, Honglei; Yoshida, Masaya; Thompson, Cynthia K

    2014-01-01

    Individuals with agrammatic aphasia exhibit restricted patterns of impairment of functional morphemes, however, syntactic characterization of the impairment is controversial. Previous studies have focused on functional morphology in clauses only. This study extends the empirical domain by testing functional morphemes in English nominal phrases in aphasia and comparing patients' impairment to their impairment of functional morphemes in English clauses. In the linguistics literature, it is assumed that clauses and nominal phrases are structurally parallel but exhibit inflectional differences. The results of the present study indicated that aphasic speakers evinced similar impairment patterns in clauses and nominal phrases. These findings are consistent with the Distributed Morphology Hypothesis (DMH), suggesting that the source of functional morphology deficits among agrammatics relates to difficulty implementing rules that convert inflectional features into morphemes. Our findings, however, are inconsistent with the Tree Pruning Hypothesis (TPH), which suggests that patients have difficulty building complex hierarchical structures.

  16. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  17. Finding zeros of nonlinear functions using the hybrid parallel cell mapping method

    NASA Astrophysics Data System (ADS)

    Xiong, Fu-Rui; Schütze, Oliver; Ding, Qian; Sun, Jian-Qiao

    2016-05-01

    Analysis of nonlinear dynamical systems including finding equilibrium states and stability boundaries often leads to a problem of finding zeros of vector functions. However, finding all the zeros of a set of vector functions in the domain of interest is quite a challenging task. This paper proposes a zero finding algorithm that combines the cell mapping methods and the subdivision techniques. Both the simple cell mapping (SCM) and generalized cell mapping (GCM) methods are used to identify a covering set of zeros. The subdivision technique is applied to enhance the solution resolution. The parallel implementation of the proposed method is discussed extensively. Several examples are presented to demonstrate the application and effectiveness of the proposed method. We then extend the study of finding zeros to the problem of finding stability boundaries of potential fields. Examples of two and three dimensional potential fields are studied. In addition to the effectiveness in finding the stability boundaries, the proposed method can handle several millions of cells in just a few seconds with the help of parallel computing in graphics processing units (GPUs).

  18. Parallel Execution of Functional Mock-up Units in Buildings Modeling

    SciTech Connect

    Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan

    2016-06-30

    A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported as a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.

  19. Introducing ONETEP: linear-scaling density functional simulations on parallel computers.

    PubMed

    Skylaris, Chris-Kriton; Haynes, Peter D; Mostofi, Arash A; Payne, Mike C

    2005-02-22

    We present ONETEP (order-N electronic total energy package), a density functional program for parallel computers whose computational cost scales linearly with the number of atoms and the number of processors. ONETEP is based on our reformulation of the plane wave pseudopotential method which exploits the electronic localization that is inherent in systems with a nonvanishing band gap. We summarize the theoretical developments that enable the direct optimization of strictly localized quantities expressed in terms of a delocalized plane wave basis. These same localized quantities lead us to a physical way of dividing the computational effort among many processors to allow calculations to be performed efficiently on parallel supercomputers. We show with examples that ONETEP achieves excellent speedups with increasing numbers of processors and confirm that the time taken by ONETEP as a function of increasing number of atoms for a given number of processors is indeed linear. What distinguishes our approach is that the localization is achieved in a controlled and mathematically consistent manner so that ONETEP obtains the same accuracy as conventional cubic-scaling plane wave approaches and offers fast and stable convergence. We expect that calculations with ONETEP have the potential to provide quantitative theoretical predictions for problems involving thousands of atoms such as those often encountered in nanoscience and biophysics.

  20. Line-field parallel swept source MHz OCT for structural and functional retinal imaging.

    PubMed

    Fechtig, Daniel J; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M; Drexler, Wolfgang; Leitgeb, Rainer A

    2015-03-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation.

  1. Line-field parallel swept source MHz OCT for structural and functional retinal imaging

    PubMed Central

    Fechtig, Daniel J.; Grajciar, Branislav; Schmoll, Tilman; Blatter, Cedric; Werkmeister, Rene M.; Drexler, Wolfgang; Leitgeb, Rainer A.

    2015-01-01

    We demonstrate three-dimensional structural and functional retinal imaging with line-field parallel swept source imaging (LPSI) at acquisition speeds of up to 1 MHz equivalent A-scan rate with sensitivity better than 93.5 dB at a central wavelength of 840 nm. The results demonstrate competitive sensitivity, speed, image contrast and penetration depth when compared to conventional point scanning OCT. LPSI allows high-speed retinal imaging of function and morphology with commercially available components. We further demonstrate a method that mitigates the effect of the lateral Gaussian intensity distribution across the line focus and demonstrate and discuss the feasibility of high-speed optical angiography for visualization of the retinal microcirculation. PMID:25798298

  2. Parallelization of the integral equation formulation of the polarizable continuum model for higher-order response functions.

    PubMed

    Ferrighi, Lara; Frediani, Luca; Fossgaard, Eirik; Ruud, Kenneth

    2006-10-21

    We present a parallel implementation of the integral equation formalism of the polarizable continuum model for Hartree-Fock and density functional theory calculations of energies and linear, quadratic, and cubic response functions. The contributions to the free energy of the solute due to the polarizable continuum have been implemented using a master-slave approach with load balancing to ensure good scalability also on parallel machines with a slow interconnect. We demonstrate the good scaling behavior of the code through calculations of Hartree-Fock energies and linear, quadratic, and cubic response function for a modest-sized sample molecule. We also explore the behavior of the parallelization of the integral equation formulation of the polarizable continuum model code when used in conjunction with a recent scheme for the storage of two-electron integrals in the memory of the different slaves in order to achieve superlinear scaling in the parallel calculations.

  3. Parallelization of the integral equation formulation of the polarizable continuum model for higher-order response functions

    NASA Astrophysics Data System (ADS)

    Ferrighi, Lara; Frediani, Luca; Fossgaard, Eirik; Ruud, Kenneth

    2006-10-01

    We present a parallel implementation of the integral equation formalism of the polarizable continuum model for Hartree-Fock and density functional theory calculations of energies and linear, quadratic, and cubic response functions. The contributions to the free energy of the solute due to the polarizable continuum have been implemented using a master-slave approach with load balancing to ensure good scalability also on parallel machines with a slow interconnect. We demonstrate the good scaling behavior of the code through calculations of Hartree-Fock energies and linear, quadratic, and cubic response function for a modest-sized sample molecule. We also explore the behavior of the parallelization of the integral equation formulation of the polarizable continuum model code when used in conjunction with a recent scheme for the storage of two-electron integrals in the memory of the different slaves in order to achieve superlinear scaling in the parallel calculations.

  4. Large-scale parallel surface functionalization of goblet-type whispering gallery mode microcavity arrays for biosensing applications.

    PubMed

    Bog, Uwe; Brinkmann, Falko; Kalt, Heinz; Koos, Christian; Mappes, Timo; Hirtz, Michael; Fuchs, Harald; Köber, Sebastian

    2014-10-15

    A novel surface functionalization technique is presented for large-scale selective molecule deposition onto whispering gallery mode microgoblet cavities. The parallel technique allows damage-free individual functionalization of the cavities, arranged on-chip in densely packaged arrays. As the stamp pad a glass slide is utilized, bearing phospholipids with different functional head groups. Coated microcavities are characterized and demonstrated as biosensors.

  5. A parallel approach for image segmentation by numerical minimization of a second-order functional

    NASA Astrophysics Data System (ADS)

    Zanella, Riccardo; Zanetti, Massimo; Ruggiero, Valeria

    2016-10-01

    Because of its attractive features, image segmentation has shown to be a promising tool in remote sensing. A known drawback about its implementation is computational complexity. Recently in [1] an effcient numerical method has been proposed for the minimization of a second-order variational approximation of the Blake-Zissermann functional. The method is an especially tailored version of the block-coordinate descent algorithm (BCDA). In order to enable the segmentation of large-size gridded data, such as Digital Surface Models, we combine a domain decomposition technique with BCDA and a parallel interconnection rule among blocks of variables. We aim to show that a simple tiling strategy enables us to treat large images even in a commodity multicore CPU, with no need of specific post-processing on tiles junctions. From the point of view of the performance, little computational effort is required to separate data in subdomains and the running time is mainly spent in concurrently solving the independent subproblems. Numerical results are provided to evaluate the effectiveness of the proposed parallel approach.

  6. Parallel Loss-of-Function at the RPM1 Bacterial Resistance Locus in Arabidopsis thaliana

    PubMed Central

    Rose, Laura; Atwell, Susanna; Grant, Murray; Holub, Eric B.

    2012-01-01

    Dimorphism at the Resistance to Pseudomonas syringae pv. maculicola 1 (RPM1) locus is well documented in natural populations of Arabidopsis thaliana and has been portrayed as a long-term balanced polymorphism. The haplotype from resistant plants contains the RPM1 gene, which enables these plants to recognize at least two structurally unrelated bacterial effector proteins (AvrB and AvrRpm1) from bacterial crop pathogens. A complete deletion of the RPM1 coding sequence has been interpreted as a single event resulting in susceptibility in these individuals. Consequently, the ability to revert to resistance or for alternative R-gene specificities to evolve at this locus has also been lost in these individuals. Our survey of variation at the RPM1 locus in a large species-wide sample of A. thaliana has revealed four new loss-of-function alleles that contain most of the intervening sequence of the RPM1 open reading frame. Multiple loss-of-function alleles may have originated due to the reported intrinsic cost to plants expressing the RPM1 protein. The frequency and geographic distribution of rpm1 alleles observed in our survey indicate the parallel origin and maintenance of these loss-of-function mutations and reveal a more complex history of natural selection at this locus than previously thought. PMID:23272006

  7. Depth estimation via parallel coevolution of disparity functions for area-based stereo

    NASA Astrophysics Data System (ADS)

    Liatsis, Panos; Goulermas, John Y.

    2001-02-01

    12 A novel system for depth estimation is proposed with the use of Symbiotic Genetic Algorithms for the continuous problem of disparity surface approximation. The approach is based on the decomposition of the entire surface to very small non- overlapping patches described by low order bivariate polynomials and the use of symbiotic optimization to enforce smoothness at the boundaries of these patches, so that the entire surface can be approximated in a smooth piecewise fashion by functionals of local support. Such optimization is amenable to a massive parallel implementation, since each patch is optimized by a different execution unit and each unit communicates through its cost function only with its four-connected neighbors. The method makes use of various existing crossover and mutation schemes for real-valued chromosome representations and a new problem-specific mechanism for generating and hybridizing the initial populations. The proposed multi-objective cost function enforces photometric similarity and smoothness between the patch boundaries at a local scale, which in the long term give rise to a globally smooth disparity surface.

  8. Temporal increase in thymocyte negative selection parallels enhanced thymic SIRPα(+) DC function.

    PubMed

    Kroger, Charles J; Wang, Bo; Tisch, Roland

    2016-10-01

    Dysregulation of negative selection contributes to T-cell-mediated autoimmunity, such as type 1 diabetes. The events regulating thymic negative selection, however, are ill defined. Work by our group and others suggest that negative selection is inefficient early in ontogeny and increases with age. This study examines temporal changes in negative selection and the thymic DC compartment. Peptide-induced thymocyte deletion in vivo was reduced in newborn versus 4-week-old NOD mice, despite a similar sensitivity of the respective thymocytes to apoptosis induction. The temporal increase in negative selection corresponded with an elevated capacity of thymic antigen-presenting cells to stimulate T cells, along with altered subset composition and function of resident DC. The frequency of signal regulatory protein α(+) (SIRPα(+) ) and plasmacytoid DCs was increased concomitant with a decrease in CD8α(+) DC in 4-week-old NOD thymi. Importantly, 4-week-old versus newborn thymic SIRPα(+) DC exhibited increased antigen processing and presentation via the MHC class II but not class I pathway, coupled with an enhanced T-cell stimulatory capacity not seen in thymic plasmacytoid DC and CD8α(+) DC. These findings indicate that the efficiency of thymic DC-mediated negative selection is limited early after birth, and increases with age paralleling expansion of functionally superior thymic SIRPα(+) DC.

  9. Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization.

    PubMed

    Su, Xiaoquan; Pan, Weihua; Song, Baoxing; Xu, Jian; Ning, Kang

    2014-01-01

    The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples and the data size for each sample are increasing rapidly. Current metagenomic analysis is both data- and computation- intensive, especially when there are many species in a metagenomic sample, and each has a large number of sequences. As such, metagenomic analyses require extensive computational power. The increasing analytical requirements further augment the challenges for computation analysis. In this work, we have proposed Parallel-META 2.0, a metagenomic analysis software package, to cope with such needs for efficient and fast analyses of taxonomical and functional structures for microbial communities. Parallel-META 2.0 is an extended and improved version of Parallel-META 1.0, which enhances the taxonomical analysis using multiple databases, improves computation efficiency by optimized parallel computing, and supports interactive visualization of results in multiple views. Furthermore, it enables functional analysis for metagenomic samples including short-reads assembly, gene prediction and functional annotation. Therefore, it could provide accurate taxonomical and functional analyses of the metagenomic samples in high-throughput manner and on large scale.

  10. Interaction-induced local moments in parallel quantum dots within the functional renormalization group approach

    NASA Astrophysics Data System (ADS)

    Protsenko, V. S.; Katanin, A. A.

    2016-11-01

    We propose a version of the functional renormalization-group (fRG) approach, which is, due to including Litim-type cutoff and switching off (or reducing) the magnetic field during fRG flow, capable of describing a singular Fermi-liquid (SFL) phase, formed due to the presence of local moments in quantum dot structures. The proposed scheme allows one to describe the first-order quantum phase transition from the "singular" to the "regular" paramagnetic phase with applied gate voltage to parallel quantum dots, symmetrically coupled to leads, and shows sizable spin splitting of electronic states in the SFL phase in the limit of vanishing magnetic field H →0 ; the calculated conductance shows good agreement with the results of the numerical renormalization group. Using the proposed fRG approach with the counterterm, we also show that for asymmetric coupling of the leads to the dots the SFL behavior similar to that for the symmetric case persists, but with occupation numbers, effective energy levels, and conductance changing continuously through the quantum phase transition into the SFL phase.

  11. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level

    PubMed Central

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E.; Muñoz-Fuentes, Violeta; Green, Andy J.; Kopuchian, Cecilia; Tubaro, Pablo L.; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M.; Wilson, Robert E.; Fago, Angela; McCracken, Kevin G.; Storz, Jay F.

    2015-01-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level. PMID:26637114

  12. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    PubMed

    Natarajan, Chandrasekhar; Projecto-Garcia, Joana; Moriyama, Hideaki; Weber, Roy E; Muñoz-Fuentes, Violeta; Green, Andy J; Kopuchian, Cecilia; Tubaro, Pablo L; Alza, Luis; Bulgarella, Mariana; Smith, Matthew M; Wilson, Robert E; Fago, Angela; McCracken, Kevin G; Storz, Jay F

    2015-12-01

    A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb) function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages), and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization). In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  13. Parallel phase-shifting digital holography with adaptive function using phase-mode spatial light modulator.

    PubMed

    Lin, Miao; Nitta, Kouichi; Matoba, Osamu; Awatsuji, Yasuhiro

    2012-05-10

    Parallel phase-shifting digital holography using a phase-mode spatial light modulator (SLM) is proposed. The phase-mode SLM implements spatial distribution of phase retardation required in the parallel phase-shifting digital holography. This SLM can also compensate dynamically the phase distortion caused by optical elements such as beam splitters, lenses, and air fluctuation. Experimental demonstration using a static object is presented.

  14. Energy distribution functions of kilovolt ions parallel and perpendicular to the magnetic field of a modified Penning discharge

    NASA Technical Reports Server (NTRS)

    Roth, R. J.

    1973-01-01

    The distribution function of ion energy parallel to the magnetic field of a modified Penning discharge has been measured with a retarding potential energy analyzer. These ions escaped through one of the throats of the magnetic mirror geometry. Simultaneous measurements of the ion energy distribution function perpendicular to the magnetic field have been made with a charge exchange neutral detector. The ion energy distribution functions are approximately Maxwellian, and the parallel and perpendicular kinetic temperatures are equal within experimental error. These results suggest that turbulent processes previously observed in this discharge Maxwellianize the velocity distribution along a radius in velocity space and cause an isotropic energy distribution. When the distributions depart from Maxwellian, they are enhanced above the Maxwellian tail.

  15. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.

  16. Parallel processing in the honeybee olfactory pathway: structure, function, and evolution.

    PubMed

    Rössler, Wolfgang; Brill, Martin F

    2013-11-01

    Animals face highly complex and dynamic olfactory stimuli in their natural environments, which require fast and reliable olfactory processing. Parallel processing is a common principle of sensory systems supporting this task, for example in visual and auditory systems, but its role in olfaction remained unclear. Studies in the honeybee focused on a dual olfactory pathway. Two sets of projection neurons connect glomeruli in two antennal-lobe hemilobes via lateral and medial tracts in opposite sequence with the mushroom bodies and lateral horn. Comparative studies suggest that this dual-tract circuit represents a unique adaptation in Hymenoptera. Imaging studies indicate that glomeruli in both hemilobes receive redundant sensory input. Recent simultaneous multi-unit recordings from projection neurons of both tracts revealed widely overlapping response profiles strongly indicating parallel olfactory processing. Whereas lateral-tract neurons respond fast with broad (generalistic) profiles, medial-tract neurons are odorant specific and respond slower. In analogy to "what-" and "where" subsystems in visual pathways, this suggests two parallel olfactory subsystems providing "what-" (quality) and "when" (temporal) information. Temporal response properties may support across-tract coincidence coding in higher centers. Parallel olfactory processing likely enhances perception of complex odorant mixtures to decode the diverse and dynamic olfactory world of a social insect.

  17. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    NASA Technical Reports Server (NTRS)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  18. Solvable biological evolution models with general fitness functions and multiple mutations in parallel mutation-selection scheme

    NASA Astrophysics Data System (ADS)

    Saakian, David B.; Hu, Chin-Kun; Khachatryan, H.

    2004-10-01

    In a recent paper [Phys. Rev. E 69, 046121 (2004)], we used the Suzuki-Trottere formalism to study a quasispecies biological evolution model in a parallel mutation-selection scheme with a single-peak fitness function and a point mutation. In the present paper, we extend such a study to evolution models with more general fitness functions or multiple mutations in the parallel mutation-selection scheme. We give some analytical equations to define the error thresholds for some general cases of mean-field-like or symmetric mutation schemes and fitness functions. We derive some equations for the dynamics in the case of a point mutation and polynomial fitness functions. We derive exact dynamics for two-point mutations, asymmetric mutations, and the four-value spin model with a single-peak fitness function. The same method is applied for the model with a royal road fitness function. We derive the steady-state distribution for the single-peak fitness function.

  19. Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

    NASA Technical Reports Server (NTRS)

    Wheatley, Thomas E.; Michaloski, John L.; Lumia, Ronald

    1989-01-01

    Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored.

  20. ALIX and ESCRT-I/II function as parallel ESCRT-III recruiters in cytokinetic abscission.

    PubMed

    Christ, Liliane; Wenzel, Eva M; Liestøl, Knut; Raiborg, Camilla; Campsteijn, Coen; Stenmark, Harald

    2016-02-29

    Cytokinetic abscission, the final stage of cell division where the two daughter cells are separated, is mediated by the endosomal sorting complex required for transport (ESCRT) machinery. The ESCRT-III subunit CHMP4B is a key effector in abscission, whereas its paralogue, CHMP4C, is a component in the abscission checkpoint that delays abscission until chromatin is cleared from the intercellular bridge. How recruitment of these components is mediated during cytokinesis remains poorly understood, although the ESCRT-binding protein ALIX has been implicated. Here, we show that ESCRT-II and the ESCRT-II-binding ESCRT-III subunit CHMP6 cooperate with ESCRT-I to recruit CHMP4B, with ALIX providing a parallel recruitment arm. In contrast to CHMP4B, we find that recruitment of CHMP4C relies predominantly on ALIX. Accordingly, ALIX depletion leads to furrow regression in cells with chromosome bridges, a phenotype associated with abscission checkpoint signaling failure. Collectively, our work reveals a two-pronged recruitment of ESCRT-III to the cytokinetic bridge and implicates ALIX in abscission checkpoint signaling.

  1. ALIX and ESCRT-I/II function as parallel ESCRT-III recruiters in cytokinetic abscission

    PubMed Central

    Christ, Liliane; Wenzel, Eva M.; Liestøl, Knut; Raiborg, Camilla

    2016-01-01

    Cytokinetic abscission, the final stage of cell division where the two daughter cells are separated, is mediated by the endosomal sorting complex required for transport (ESCRT) machinery. The ESCRT-III subunit CHMP4B is a key effector in abscission, whereas its paralogue, CHMP4C, is a component in the abscission checkpoint that delays abscission until chromatin is cleared from the intercellular bridge. How recruitment of these components is mediated during cytokinesis remains poorly understood, although the ESCRT-binding protein ALIX has been implicated. Here, we show that ESCRT-II and the ESCRT-II–binding ESCRT-III subunit CHMP6 cooperate with ESCRT-I to recruit CHMP4B, with ALIX providing a parallel recruitment arm. In contrast to CHMP4B, we find that recruitment of CHMP4C relies predominantly on ALIX. Accordingly, ALIX depletion leads to furrow regression in cells with chromosome bridges, a phenotype associated with abscission checkpoint signaling failure. Collectively, our work reveals a two-pronged recruitment of ESCRT-III to the cytokinetic bridge and implicates ALIX in abscission checkpoint signaling. PMID:26929449

  2. A coarse-grained model for DNA-functionalized spherical colloids, revisited: Effective pair potential from parallel replica simulations

    NASA Astrophysics Data System (ADS)

    Theodorakis, Panagiotis E.; Dellago, Christoph; Kahl, Gerhard

    2013-01-01

    We discuss a coarse-grained model recently proposed by Starr and Sciortino [J. Phys.: Condens. Matter 18, L347 (2006), 10.1088/0953-8984/18/26/L02] for spherical particles functionalized with short single DNA strands. The model incorporates two key aspects of DNA hybridization, i.e., the specificity of binding between DNA bases and the strong directionality of hydrogen bonds. Here, we calculate the effective potential between two DNA-functionalized particles of equal size using a parallel replica protocol. We find that the transition from bonded to unbonded configurations takes place at considerably lower temperatures compared to those that were originally predicted using standard simulations in the canonical ensemble. We put particular focus on DNA-decorations of tetrahedral and octahedral symmetry, as they are promising candidates for the self-assembly into a single-component diamond structure. Increasing colloid size hinders hybridization of the DNA strands, in agreement with experimental findings.

  3. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers.

    PubMed

    Spencer, Sarah J; Tamminen, Manu V; Preheim, Sarah P; Guo, Mira T; Briggs, Adrian W; Brito, Ilana L; A Weitz, David; Pitkänen, Leena K; Vigneault, Francois; Juhani Virta, Marko P; Alm, Eric J

    2016-02-01

    Many microbial communities are characterized by high genetic diversity. 16S ribosomal RNA sequencing can determine community members, and metagenomics can determine the functional diversity, but resolving the functional role of individual cells in high throughput remains an unsolved challenge. Here, we describe epicPCR (Emulsion, Paired Isolation and Concatenation PCR), a new technique that links functional genes and phylogenetic markers in uncultured single cells, providing a throughput of hundreds of thousands of cells with costs comparable to one genomic library preparation. We demonstrate the utility of our technique in a natural environment by profiling a sulfate-reducing community in a freshwater lake, revealing both known sulfate reducers and discovering new putative sulfate reducers. Our method is adaptable to any conserved genetic trait and translates genetic associations from diverse microbial samples into a sequencing library that answers targeted ecological questions. Potential applications include identifying functional community members, tracing horizontal gene transfer networks and mapping ecological interactions between microbial cells.

  4. SPARC: Accurate and efficient finite-difference formulation and parallel implementation of Density Functional Theory: Isolated clusters

    NASA Astrophysics Data System (ADS)

    Ghosh, Swarnava; Suryanarayana, Phanish

    2017-03-01

    As the first component of SPARC (Simulation Package for Ab-initio Real-space Calculations), we present an accurate and efficient finite-difference formulation and parallel implementation of Density Functional Theory (DFT) for isolated clusters. Specifically, utilizing a local reformulation of the electrostatics, the Chebyshev polynomial filtered self-consistent field iteration, and a reformulation of the non-local component of the force, we develop a framework using the finite-difference representation that enables the efficient evaluation of energies and atomic forces to within the desired accuracies in DFT. Through selected examples consisting of a variety of elements, we demonstrate that SPARC obtains exponential convergence in energy and forces with domain size; systematic convergence in the energy and forces with mesh-size to reference plane-wave result at comparably high rates; forces that are consistent with the energy, both free from any noticeable 'egg-box' effect; and accurate ground-state properties including equilibrium geometries and vibrational spectra. In addition, for systems consisting up to thousands of electrons, SPARC displays weak and strong parallel scaling behavior that is similar to well-established and optimized plane-wave implementations, but with a significantly reduced prefactor. Overall, SPARC represents an attractive alternative to plane-wave codes for practical DFT simulations of isolated clusters.

  5. The functional significance of cortical reorganization and the parallel development of CI therapy

    PubMed Central

    Taub, Edward; Uswatte, Gitendra; Mark, Victor W.

    2014-01-01

    For the nineteenth and the better part of the twentieth centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the twentieth century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit in humans of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation. PMID:25018720

  6. The functional and anatomical organization of marsupial neocortex: Evidence for parallel evolution across mammals

    PubMed Central

    Karlen, Sarah J.; Krubitzer, Leah

    2007-01-01

    Marsupials are a diverse group of mammals that occupy a large range of habitats and have evolved a wide array of unique adaptations. Although they are as diverse as placental mammals, our understanding of marsupial brain organization is more limited. Like placental mammals, marsupials have striking similarities in neocortical organization, such as a constellation of cortical fields including S1, S2, V1, V2, and A1, that are functionally, architectonically, and connectionally distinct. In this review, we describe the general lifestyle and morphological characteristics of all marsupials and the organization of somatosensory, motor, visual, and auditory cortex. For each sensory system, we compare the functional organization and the corticocortical and thalamocortical connections of the neocortex across species. Differences between placental and marsupial species are discussed and the theories on neocortical evolution that have been derived from studying marsupials, particularly the idea of a sensorimotor amalgam, are evaluated. Overall, marsupials inhabit a variety of niches and assume many different lifestyles. For example, marsupials occupy terrestrial, arboreal, burrowing, and aquatic environments; some animals are highly social while others are solitary; and different species are carnivorous, herbivorous, or omnivorous. For each of these adaptations, marsupials have evolved an array of morphological, behavioral, and cortical specializations that are strikingly similar to those observed in placental mammals occupying similar habitats, which indicate that there are constraints imposed on evolving nervous systems that result in recurrent solutions to similar environmental challenges. PMID:17507143

  7. Parallel blind deconvolution of astronomical images based on the fractal energy ratio of the image and regularization of the point spread function

    NASA Astrophysics Data System (ADS)

    Jia, Peng; Cai, Dongmei; Wang, Dong

    2014-11-01

    A parallel blind deconvolution algorithm is presented. The algorithm contains the constraints of the point spread function (PSF) derived from the physical process of the imaging. Additionally, in order to obtain an effective restored image, the fractal energy ratio is used as an evaluation criterion to estimate the quality of the image. This algorithm is fine-grained parallelized to increase the calculation speed. Results of numerical experiments and real experiments indicate that this algorithm is effective.

  8. Parallel changes in cortical neuron biochemistry and motor function in protein-energy malnourished adult rats.

    PubMed

    Alaverdashvili, Mariam; Hackett, Mark J; Caine, Sally; Paterson, Phyllis G

    2017-04-01

    While protein-energy malnutrition in the adult has been reported to induce motor abnormalities and exaggerate motor deficits caused by stroke, it is not known if alterations in mature cortical neurons contribute to the functional deficits. Therefore, we explored if PEM in adult rats provoked changes in the biochemical profile of neurons in the forelimb and hindlimb regions of the motor cortex. Fourier transform infrared spectroscopic imaging using a synchrotron generated light source revealed for the first time altered lipid composition in neurons and subcellular domains (cytosol and nuclei) in a cortical layer and region-specific manner. This change measured by the area under the curve of the δ(CH2) band may indicate modifications in membrane fluidity. These PEM-induced biochemical changes were associated with the development of abnormalities in forelimb use and posture. The findings of this study provide a mechanism by which PEM, if not treated, could exacerbate the course of various neurological disorders and diminish treatment efficacy.

  9. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    PubMed Central

    Gudino, N.; Sonmez, M.; Yao, Z.; Baig, T.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Martens, M.; Balaban, R. S.; Hansen, M. S.; Griswold, M. A.

    2015-01-01

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B1) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  10. Parallel transmit excitation at 1.5 T based on the minimization of a driving function for device heating

    SciTech Connect

    Gudino, N.; Sonmez, M.; Nielles-Vallespin, S.; Faranesh, A. Z.; Lederman, R. J.; Balaban, R. S.; Hansen, M. S.; Yao, Z.; Baig, T.; Martens, M.; Griswold, M. A.

    2015-01-15

    Purpose: To provide a rapid method to reduce the radiofrequency (RF) E-field coupling and consequent heating in long conductors in an interventional MRI (iMRI) setup. Methods: A driving function for device heating (W) was defined as the integration of the E-field along the direction of the wire and calculated through a quasistatic approximation. Based on this function, the phases of four independently controlled transmit channels were dynamically changed in a 1.5 T MRI scanner. During the different excitation configurations, the RF induced heating in a nitinol wire immersed in a saline phantom was measured by fiber-optic temperature sensing. Additionally, a minimization of W as a function of phase and amplitude values of the different channels and constrained by the homogeneity of the RF excitation field (B{sub 1}) over a region of interest was proposed and its results tested on the benchtop. To analyze the validity of the proposed method, using a model of the array and phantom setup tested in the scanner, RF fields and SAR maps were calculated through finite-difference time-domain (FDTD) simulations. In addition to phantom experiments, RF induced heating of an active guidewire inserted in a swine was also evaluated. Results: In the phantom experiment, heating at the tip of the device was reduced by 92% when replacing the body coil by an optimized parallel transmit excitation with same nominal flip angle. In the benchtop, up to 90% heating reduction was measured when implementing the constrained minimization algorithm with the additional degree of freedom given by independent amplitude control. The computation of the optimum phase and amplitude values was executed in just 12 s using a standard CPU. The results of the FDTD simulations showed similar trend of the local SAR at the tip of the wire and measured temperature as well as to a quadratic function of W, confirming the validity of the quasistatic approach for the presented problem at 64 MHz. Imaging and heating

  11. Combining fMRI and SNP Data to Investigate Connections Between Brain Function and Genetics Using Parallel ICA

    PubMed Central

    Liu, Jingyu; Pearlson, Godfrey; Windemuth, Andreas; Ruano, Gualberto; Perrone-Bizzozero, Nora I.; Calhoun, Vince

    2009-01-01

    There is current interest in understanding genetic influences on both healthy and disordered brain function. We assessed brain function with functional magnetic resonance imaging (fMRI) data collected during an auditory oddball task—detecting an infrequent sound within a series of frequent sounds. Then, task-related imaging findings were utilized as potential intermediate phenotypes (endophenotypes) to investigate genomic factors derived from a single nucleotide polymorphism (SNP) array. Our target is the linkage of these genomic factors to normal/abnormal brain functionality. We explored parallel independent component analysis (paraICA) as a new method for analyzing multimodal data. The method was aimed to identify simultaneously independent components of each modality and the relationships between them. When 43 healthy controls and 20 schizophrenia patients, all Caucasian, were studied, we found a correlation of 0.38 between one fMRI component and one SNP component. This fMRI component consisted mainly of parietal lobe activations. The relevant SNP component was contributed to significantly by 10 SNPs located in genes, including those coding for the nicotinic α-7cholinergic receptor, aromatic amino acid decarboxylase, disrupted in schizophrenia 1, among others. Both fMRI and SNP components showed significant differences in loading parameters between the schizophrenia and control groups (P = 0.0006 for the fMRI component; P = 0.001 for the SNP component). In summary, we constructed a framework to identify interactions between brain functional and genetic information; our findings provide a proof-of-concept that genomic SNP factors can be investigated by using endophenotypic imaging findings in a multivariate format. PMID:18072279

  12. EUPDF: Eulerian Monte Carlo Probability Density Function Solver for Applications With Parallel Computing, Unstructured Grids, and Sprays

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    1998-01-01

    The success of any solution methodology used in the study of gas-turbine combustor flows depends a great deal on how well it can model the various complex and rate controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as convective and radiative heat transfer and other phenomena. The phenomena to be modeled, which are controlled by these processes, often strongly interact with each other at different times and locations. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and evaporation in many practical combustion devices. The influence of turbulence in a diffusion flame manifests itself in several forms, ranging from the so-called wrinkled, or stretched, flamelets regime to the distributed combustion regime, depending upon how turbulence interacts with various flame scales. Conventional turbulence models have difficulty treating highly nonlinear reaction rates. A solution procedure based on the composition joint probability density function (PDF) approach holds the promise of modeling various important combustion phenomena relevant to practical combustion devices (such as extinction, blowoff limits, and emissions predictions) because it can account for nonlinear chemical reaction rates without making approximations. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on the PDF method to unstructured grids, parallel computing, and sprays. EUPDF, which was developed by M.S. Raju of Nyma, Inc., was designed to be massively parallel and could easily be coupled with any existing gas-phase and/or spray solvers. EUPDF can use an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements. The application of the PDF method showed favorable results when applied to several supersonic

  13. Postsynaptic inositol 1,4,5-trisphosphate signaling maintains presynaptic function of parallel fiber–Purkinje cell synapses via BDNF

    PubMed Central

    Furutani, Kazuharu; Okubo, Yohei; Kakizawa, Sho; Iino, Masamitsu

    2006-01-01

    The maintenance of synaptic functions is essential for neuronal information processing, but cellular mechanisms that maintain synapses in the adult brain are not well understood. Here, we report an activity-dependent maintenance mechanism of parallel fiber (PF)–Purkinje cell (PC) synapses in the cerebellum. When postsynaptic metabotropic glutamate receptor (mGluR) or inositol 1,4,5-trisphosphate (IP3) signaling was chronically inhibited in vivo, PF–PC synaptic strength decreased because of a decreased transmitter release probability. The same effects were observed when PF activity was inhibited in vivo by the suppression of NMDA receptor-mediated inputs to granule cells. PF–PC synaptic strength similarly decreased after the in vivo application of an antibody against brain-derived neurotrophic factor (BDNF). Furthermore, the weakening of synaptic connection caused by the blockade of mGluR–IP3 signaling was reversed by the in vivo application of BDNF. These results indicate that a signaling cascade comprising PF activity, postsynaptic mGluR–IP3 signaling and subsequent BDNF signaling maintains presynaptic functions in the mature cerebellum. PMID:16709674

  14. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    NASA Astrophysics Data System (ADS)

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips.

  15. High efficiency integration of three-dimensional functional microdevices inside a microfluidic chip by using femtosecond laser multifoci parallel microfabrication

    PubMed Central

    Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji

    2016-01-01

    High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips. PMID:26818119

  16. Super-resolution non-parametric deconvolution in modelling the radial response function of a parallel plate ionization chamber.

    PubMed

    Kulmala, A; Tenhunen, M

    2012-11-07

    The signal of the dosimetric detector is generally dependent on the shape and size of the sensitive volume of the detector. In order to optimize the performance of the detector and reliability of the output signal the effect of the detector size should be corrected or, at least, taken into account. The response of the detector can be modelled using the convolution theorem that connects the system input (actual dose), output (measured result) and the effect of the detector (response function) by a linear convolution operator. We have developed the super-resolution and non-parametric deconvolution method for determination of the cylinder symmetric ionization chamber radial response function. We have demonstrated that the presented deconvolution method is able to determine the radial response for the Roos parallel plate ionization chamber with a better than 0.5 mm correspondence with the physical measures of the chamber. In addition, the performance of the method was proved by the excellent agreement between the output factors of the stereotactic conical collimators (4-20 mm diameter) measured by the Roos chamber, where the detector size is larger than the measured field, and the reference detector (diode). The presented deconvolution method has a potential in providing reference data for more accurate physical models of the ionization chamber as well as for improving and enhancing the performance of the detectors in specific dosimetric problems.

  17. A three-way parallel ICA approach to analyze links among genetics, brain structure and brain function.

    PubMed

    Vergara, Victor M; Ulloa, Alvaro; Calhoun, Vince D; Boutte, David; Chen, Jiayu; Liu, Jingyu

    2014-09-01

    Multi-modal data analysis techniques, such as the Parallel Independent Component Analysis (pICA), are essential in neuroscience, medical imaging and genetic studies. The pICA algorithm allows the simultaneous decomposition of up to two data modalities achieving better performance than separate ICA decompositions and enabling the discovery of links between modalities. However, advances in data acquisition techniques facilitate the collection of more than two data modalities from each subject. Examples of commonly measured modalities include genetic information, structural magnetic resonance imaging (MRI) and functional MRI. In order to take full advantage of the available data, this work extends the pICA approach to incorporate three modalities in one comprehensive analysis. Simulations demonstrate the three-way pICA performance in identifying pairwise links between modalities and estimating independent components which more closely resemble the true sources than components found by pICA or separate ICA analyses. In addition, the three-way pICA algorithm is applied to real experimental data obtained from a study that investigate genetic effects on alcohol dependence. Considered data modalities include functional MRI (contrast images during alcohol exposure paradigm), gray matter concentration images from structural MRI and genetic single nucleotide polymorphism (SNP). The three-way pICA approach identified links between a SNP component (pointing to brain function and mental disorder associated genes, including BDNF, GRIN2B and NRG1), a functional component related to increased activation in the precuneus area, and a gray matter component comprising part of the default mode network and the caudate. Although such findings need further verification, the simulation and in-vivo results validate the three-way pICA algorithm presented here as a useful tool in biomedical data fusion applications.

  18. Parallel Functional Activity Profiling Reveals Valvulopathogens Are Potent 5-Hydroxytryptamine2B Receptor Agonists: Implications for Drug Safety Assessment

    PubMed Central

    Huang, Xi-Ping; Setola, Vincent; Yadav, Prem N.; Allen, John A.; Rogan, Sarah C.; Hanson, Bonnie J.; Revankar, Chetana; Robers, Matt; Doucette, Chris

    2009-01-01

    Drug-induced valvular heart disease (VHD) is a serious side effect of a few medications, including some that are on the market. Pharmacological studies of VHD-associated medications (e.g., fenfluramine, pergolide, methysergide, and cabergoline) have revealed that they and/or their metabolites are potent 5-hydroxytryptamine2B (5-HT2B) receptor agonists. We have shown that activation of 5-HT2B receptors on human heart valve interstitial cells in vitro induces a proliferative response reminiscent of the fibrosis that typifies VHD. To identify current or future drugs that might induce VHD, we screened approximately 2200 U.S. Food and Drug Administration (FDA)-approved or investigational medications to identify 5-HT2B receptor agonists, using calcium-based high-throughput screening. Of these 2200 compounds, 27 were 5-HT2B receptor agonists (hits); 14 of these had previously been identified as 5-HT2B receptor agonists, including seven bona fide valvulopathogens. Six of the hits (guanfacine, quinidine, xylometazoline, oxymetazoline, fenoldopam, and ropinirole) are approved medications. Twenty-three of the hits were then “functionally profiled” (i.e., assayed in parallel for 5-HT2B receptor agonism using multiple readouts to test for functional selectivity). In these assays, the known valvulopathogens were efficacious at concentrations as low as 30 nM, whereas the other compounds were less so. Hierarchical clustering analysis of the pEC50 data revealed that ropinirole (which is not associated with valvulopathy) was clearly segregated from known valvulopathogens. Taken together, our data demonstrate that patterns of 5-HT2B receptor functional selectivity might be useful for identifying compounds likely to induce valvular heart disease. PMID:19570945

  19. Functional Traits in Parallel Evolutionary Radiations and Trait-Environment Associations in the Cape Floristic Region of South Africa.

    PubMed

    Mitchell, Nora; Moore, Timothy E; Mollmann, Hayley Kilroy; Carlson, Jane E; Mocko, Kerri; Martinez-Cabrera, Hugo; Adams, Christopher; Silander, John A; Jones, Cynthia S; Schlichting, Carl D; Holsinger, Kent E

    2015-04-01

    Evolutionary radiations with extreme levels of diversity present a unique opportunity to study the role of the environment in plant evolution. If environmental adaptation played an important role in such radiations, we expect to find associations between functional traits and key climatic variables. Similar trait-environment associations across clades may reflect common responses, while contradictory associations may suggest lineage-specific adaptations. Here, we explore trait-environment relationships in two evolutionary radiations in the fynbos biome of the highly biodiverse Cape Floristic Region (CFR) of South Africa. Protea and Pelargonium are morphologically and evolutionarily diverse genera that typify the CFR yet are substantially different in growth form and morphology. Our analytical approach employs a Bayesian multiple-response generalized linear mixed-effects model, taking into account covariation among traits and controlling for phylogenetic relationships. Of the pairwise trait-environment associations tested, 6 out of 24 were in the same direction and 2 out of 24 were in opposite directions, with the latter apparently reflecting alternative life-history strategies. These findings demonstrate that trait diversity within two plant lineages may reflect both parallel and idiosyncratic responses to the environment, rather than all taxa conforming to a global-scale pattern. Such insights are essential for understanding how trait-environment associations arise and how they influence species diversification.

  20. Kinematic Modeling and Function Generation for Non-linear Curves Using 5R Double Arm Parallel Manipulator

    NASA Astrophysics Data System (ADS)

    Keshavkumar Kamaliya, Parth; Patel, Yashavant Kumar Dashrathlal

    2016-01-01

    Double arm configuration using parallel manipulator mimic the human arm motions either for planar or spatial space. These configurations are currently lucrative for researchers as it also replaces human workers without major redesign of work-place in industries. Humans' joint ranges limitation of arms can be resolved by replacement of either revolute or spherical joints in manipulator. Hence, the scope of maximum workspace utilization is prevailed. Planar configuration with five revolute joints (5R) is considered to imitate human arm motions in a plane using Double Arm Manipulator (DAM). Position analysis for tool that can be held in end links of configuration is carried out using Pro/mechanism in Creo® as well as SimMechanics. D-H parameters are formulated and its results derived using developed MATLAB programs are compared with mechanism simulation as well as SimMechanics results. Inverse kinematics model is developed for trajectory planning in order to trace tool trajectory in a continuous and smooth sequence. Polynomial functions are derived for position, velocity and acceleration for linear and non-linear curves in joint space. Analytical results obtained for trajectory planning are validated with simulation results of Creo®.

  1. Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

    NASA Astrophysics Data System (ADS)

    Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

    2010-12-01

    Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer

  2. Casimir amplitudes and capillary condensation of near-critical fluids between parallel plates: renormalized local functional theory.

    PubMed

    Okamoto, Ryuichi; Onuki, Akira

    2012-03-21

    We investigate the critical behavior of a near-critical fluid confined between two parallel plates in contact with a reservoir by calculating the order parameter profile and the Casimir amplitudes (for the force density and for the grand potential). Our results are applicable to one-component fluids and binary mixtures. We assume that the walls absorb one of the fluid components selectively for binary mixtures. We propose a renormalized local functional theory accounting for the fluctuation effects. Analysis is performed in the plane of the temperature T and the order parameter in the reservoir ψ(∞). Our theory is universal if the physical quantities are scaled appropriately. If the component favored by the walls is slightly poor in the reservoir, there appears a line of first-order phase transition of capillary condensation outside the bulk coexistence curve. The excess adsorption changes discontinuously between condensed and noncondensed states at the transition. With increasing T, the transition line ends at a capillary critical point T=T(c) (ca) slightly lower than the bulk critical temperature T(c) for the upper critical solution temperature. The Casimir amplitudes are larger than their critical point values by 10-100 times at off-critical compositions near the capillary condensation line.

  3. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  4. Cobtorin target analysis reveals that pectin functions in the deposition of cellulose microfibrils in parallel with cortical microtubules.

    PubMed

    Yoneda, Arata; Ito, Takuya; Higaki, Takumi; Kutsuna, Natsumaro; Saito, Tamio; Ishimizu, Takeshi; Osada, Hiroyuki; Hasezawa, Seiichiro; Matsui, Minami; Demura, Taku

    2010-11-01

    Cellulose and pectin are major components of primary cell walls in plants, and it is believed that their mechanical properties are important for cell morphogenesis. It has been hypothesized that cortical microtubules guide the movement of cellulose microfibril synthase in a direction parallel with the microtubules, but the mechanism by which this alignment occurs remains unclear. We have previously identified cobtorin as an inhibitor that perturbs the parallel relationship between cortical microtubules and nascent cellulose microfibrils. In this study, we searched for the protein target of cobtorin, and we found that overexpression of pectin methylesterase and polygalacturonase suppressed the cobtorin-induced cell-swelling phenotype. Furthermore, treatment with polygalacturonase restored the deposition of cellulose microfibrils in the direction parallel with cortical microtubules, and cobtorin perturbed the distribution of methylated pectin. These results suggest that control over the properties of pectin is important for the deposition of cellulose microfibrils and/or the maintenance of their orientation parallel with the cortical microtubules.

  5. Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

    PubMed

    Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

    2016-08-05

    The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc.

  6. Parallel machines: Parallel machine languages

    SciTech Connect

    Iannucci, R.A. )

    1990-01-01

    This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.

  7. Parallel pipelining

    SciTech Connect

    Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.

    1995-09-01

    In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.

  8. rasterEngine: an easy-to-use R function for applying complex geostatistical models to raster datasets in a parallel computing environment

    NASA Astrophysics Data System (ADS)

    Greenberg, J. A.

    2013-12-01

    As geospatial analyses progress in tandem with increasing availability of large complex geographic data sets and high performance computing (HPC), there is an increasing gap in the ability of end-user tools to take advantage of these advances. Specifically, the practical implementation of complex statistical models on large gridded geographic datasets (e.g. remote sensing analysis, species distribution mapping, topographic transformations, and local neighborhood analyses) currently requires a significant knowledge base. A user must be proficient in the chosen model as well as the nuances of scientific programming, raster data models, memory management, parallel computing, and system design. This is further complicated by the fact that many of the cutting-edge analytical tools were developed for non-geospatial datasets and are not part of standard GIS packages, but are available in scientific computing languages such as R and MATLAB. We present a computing function 'rasterEngine' written in the R scientific computing language and part of the CRAN package 'spatial.tools' with these challenges in mind. The goal of rasterEngine is to allow a user to quickly develop and apply analytical models within the R computing environment to arbitrarily large gridded datasets, taking advantage of available parallel computing resources, and without requiring a deep understanding of HPC and raster data models. We provide several examples of rasterEngine being used to solve common grid based analyses, including remote sensing image analyses, topographic transformations, and species distribution modeling. With each example, the parallel processing performance results are presented.

  9. Analysis of speedup as function of block size and cluster size for parallel feed-forward neural networks on a Beowulf cluster.

    PubMed

    Mörchen, Fabian

    2004-03-01

    The performance of feed-forward neural networks trained with the backpropagation algorithm on a dedicated Beowulf cluster is analyzed. The concept of training set parallelism is applied. A new model for run time and speedup prediction is developed. With the model the speedup and efficiency of one iteration of the neural networks can be estimated as a function of block size and cluster size. The model is applied to three example problems representing different applications and network architectures. The estimation of the model has a higher accuracy than traditional methods for run time estimation and can be efficiently calculated. Experiments show that speedup of one iteration does not necessarily translate to a shorter training time toward a given error level. To overcome this problem a heuristic extension to training set parallelism called weight averaging is developed. The results show that training in parallel should only be done on clusters with high performance network connections or a multiprocessor machine. A rule of thumb is given for how much network performance of the cluster is needed to achieve speedup of the training time for a neural network.

  10. toca-1 is in a novel pathway that functions in parallel with a SUN-KASH nuclear envelope bridge to move nuclei in Caenorhabditis elegans.

    PubMed

    Chang, Yu-Tai; Dranow, Daniel; Kuhn, Jonathan; Meyerzon, Marina; Ngo, Minh; Ratner, Dmitry; Warltier, Karin; Starr, Daniel A

    2013-01-01

    Moving the nucleus to an intracellular location is critical to many fundamental cell and developmental processes, including cell migration, differentiation, fertilization, and establishment of cellular polarity. Bridges of SUN and KASH proteins span the nuclear envelope and mediate many nuclear positioning events, but other pathways function independently through poorly characterized mechanisms. To identify and characterize novel mechanisms of nuclear migration, we conducted a nonbiased forward genetic screen for mutations that enhanced the nuclear migration defect of unc-84, which encodes a SUN protein. In Caenorhabditis elegans larvae, failure of hypodermal P-cell nuclear migration results in uncoordinated and egg-laying-defective animals. The process of P-cell nuclear migration in unc-84 null animals is temperature sensitive; at 25° migration fails in unc-84 mutants, but at 15° the migration occurs normally. We hypothesized that an additional pathway functions in parallel to the unc-84 pathway to move P-cell nuclei at 15°. In support of our hypothesis, forward genetic screens isolated eight emu (enhancer of the nuclear migration defect of unc-84) mutations that disrupt nuclear migration only in a null unc-84 background. The yc20 mutant was determined to carry a mutation in the toca-1 gene. TOCA-1 functions to move P-cell nuclei in a cell-autonomous manner. TOCA-1 is conserved in humans, where it functions to nucleate and organize actin during endocytosis. Therefore, we have uncovered a player in a previously unknown, likely actin-dependent, pathway that functions to move nuclei in parallel to SUN-KASH bridges. The other emu mutations potentially represent other components of this novel pathway.

  11. High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and Massively Parallel Sequencing

    DTIC Science & Technology

    2010-10-14

    functions of Alphavirus non- structural proteins has been elucidated through molecular and classical genetics studies of two prototypical alphaviruses ...sensitive mutants have been used extensively to elucidate replication and virulence properties of alphaviruses . To demonstrate the utility of our functional... Alphavirus replication , and has helped to identify the activities and interactions of many viral proteins [10,21,37,38,39,40,41,42, 43,44]. We

  12. Exploration of the functional hierarchy of the basal layer of human epidermis at the single-cell level using parallel clonal microcultures of keratinocytes.

    PubMed

    Fortunel, Nicolas O; Cadio, Emmanuelle; Vaigot, Pierre; Chadli, Loubna; Moratille, Sandra; Bouet, Stéphan; Roméo, Paul-Henri; Martin, Michèle T

    2010-04-01

    The basal layer of human epidermis contains both stem cells and keratinocyte progenitors. Because of this cellular heterogeneity, the development of methods suitable for investigations at a clonal level is dramatically needed. Here, we describe a new method that allows multi-parallel clonal cultures of basal keratinocytes. Immediately after extraction from tissue samples, cells are sorted by flow cytometry based on their high integrin-alpha 6 expression and plated individually in microculture wells. This automated cell deposition process enables large-scale characterization of primary clonogenic capacities. The resulting clonal growth profile provided a precise assessment of basal keratinocyte hierarchy, as the size distribution of 14-day-old clones ranged from abortive to highly proliferative clones containing 1.7 x 10(5) keratinocytes (17.4 cell doublings). Importantly, these 14-day-old primary clones could be used to generate three-dimensional reconstructed epidermis with the progeny of a single cell. In long-term cultures, a fraction of highly proliferative clones could sustain extensive expansion of >100 population doublings over 14 weeks and exhibited long-term epidermis reconstruction potency, thus fulfilling candidate stem cell functional criteria. In summary, parallel clonal microcultures provide a relevant model for single-cell studies on interfollicular keratinocytes, which could be also used in other epithelial models, including hair follicle and cornea. The data obtained using this system support the hierarchical model of basal keratinocyte organization in human interfollicular epidermis.

  13. A study of parallelizing O(N) Green-function-based Monte Carlo method for many fermions coupled with classical degrees of freedom

    NASA Astrophysics Data System (ADS)

    Zhang, Shixun; Yamagia, Shinichi; Yunoki, Seiji

    2013-08-01

    Models of fermions interacting with classical degrees of freedom are applied to a large variety of systems in condensed matter physics. For this class of models, Weiße [Phys. Rev. Lett. 102, 150604 (2009)] has recently proposed a very efficient numerical method, called O(N) Green-Function-Based Monte Carlo (GFMC) method, where a kernel polynomial expansion technique is used to avoid the full numerical diagonalization of the fermion Hamiltonian matrix of size N, which usually costs O(N3) computational complexity. Motivated by this background, in this paper we apply the GFMC method to the double exchange model in three spatial dimensions. We mainly focus on the implementation of GFMC method using both MPI on a CPU-based cluster and Nvidia's Compute Unified Device Architecture (CUDA) programming techniques on a GPU-based (Graphics Processing Unit based) cluster. The time complexity of the algorithm and the parallel implementation details on the clusters are discussed. We also show the performance scaling for increasing Hamiltonian matrix size and increasing number of nodes, respectively. The performance evaluation indicates that for a 323 Hamiltonian a single GPU shows higher performance equivalent to more than 30 CPU cores parallelized using MPI.

  14. Parallel processor engine model program

    NASA Technical Reports Server (NTRS)

    Mclaughlin, P.

    1984-01-01

    The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

  15. Drawing a high-resolution functional map of adeno-associated virus capsid by massively parallel sequencing.

    PubMed

    Adachi, Kei; Enoki, Tatsuji; Kawano, Yasuhiro; Veraz, Michael; Nakai, Hiroyuki

    2014-01-01

    Adeno-associated virus (AAV) capsid engineering is an emerging approach to advance gene therapy. However, a systematic analysis on how each capsid amino acid contributes to multiple functions remains challenging. Here we show proof-of-principle and successful application of a novel approach, termed AAV Barcode-Seq, that allows us to characterize phenotypes of hundreds of different AAV strains in a high-throughput manner and therefore overcomes technical difficulties in the systematic analysis. In this approach, we generate DNA barcode-tagged AAV libraries and determine a spectrum of phenotypes of each AAV strain by Illumina barcode sequencing. By applying this method to AAV capsid mutant libraries tagged with DNA barcodes, we can draw a high-resolution map of AAV capsid amino acids important for the structural integrity and functions including receptor binding, tropism, neutralization and blood clearance. Thus, Barcode-Seq provides a new tool to generate a valuable resource for virus and gene therapy research.

  16. Functional development of mechanosensitive hair cells in stem cell-derived organoids parallels native vestibular hair cells

    PubMed Central

    Liu, Xiao-Ping; Koehler, Karl R.; Mikosz, Andrew M.; Hashino, Eri; Holt, Jeffrey R.

    2016-01-01

    Inner ear sensory epithelia contain mechanosensitive hair cells that transmit information to the brain through innervation with bipolar neurons. Mammalian hair cells do not regenerate and are limited in number. Here we investigate the potential to generate mechanosensitive hair cells from mouse embryonic stem cells in a three-dimensional (3D) culture system. The system faithfully recapitulates mouse inner ear induction followed by self-guided development into organoids that morphologically resemble inner ear vestibular organs. We find that organoid hair cells acquire mechanosensitivity equivalent to functionally mature hair cells in postnatal mice. The organoid hair cells also progress through a similar dynamic developmental pattern of ion channel expression, reminiscent of two subtypes of native vestibular hair cells. We conclude that our 3D culture system can generate large numbers of fully functional sensory cells which could be used to investigate mechanisms of inner ear development and disease as well as regenerative mechanisms for inner ear repair. PMID:27215798

  17. A parallel implementation of the analytic nuclear gradient for time-dependent density functional theory within the Tamm-Dancoff approximation

    NASA Astrophysics Data System (ADS)

    Liu, Fenglai; Gan, Zhengting; Shao, Yihan; Hsu, Chao-Ping; Dreuw, Andreas; Head-Gordon, Martin; Miller, Benjamin T.; Brooks, Bernard R.; Yu, Jian-Guo; Furlani, Thomas R.; Kong, Jing

    2010-10-01

    We derived the analytic gradient for the excitation energies from a time-dependent density functional theory calculation within the Tamm-Dancoff approximation (TDDFT/TDA) using Gaussian atomic orbital basis sets, and introduced an efficient serial and parallel implementation. Some timing results are shown from a B3LYP/6-31G**/SG-1-grid calculation on zincporphyrin. We also performed TDDFT/TDA geometry optimizations for low-lying excited states of 20 small molecules, and compared adiabatic excitation energies and optimized geometry parameters to experimental values using the B3LYP and ωB97 functionals. There are only minor differences between TDDFT and TDA optimized excited state geometries and adiabatic excitation energies. Optimized bond lengths are in better agreement with experiment for both functionals than either CC2 or SOS-CIS(D0), while adiabatic excitation energies are in similar or slightly poorer agreement. Optimized bond angles with both functionals are more accurate than CIS values, but less accurate than either CC2 or SOS-CIS(D0) ones.

  18. Resonance line transfer calculations by doubling thin layers. I - Comparison with other techniques. II - The use of the R-parallel redistribution function. [planetary atmospheres

    NASA Technical Reports Server (NTRS)

    Yelle, Roger V.; Wallace, Lloyd

    1989-01-01

    A versatile and efficient technique for the solution of the resonance line scattering problem with frequency redistribution in planetary atmospheres is introduced. Similar to the doubling approach commonly used in monochromatic scattering problems, the technique has been extended to include the frequency dependence of the radiation field. Methods for solving problems with external or internal sources and coupled spectral lines are presented, along with comparison of some sample calculations with results from Monte Carlo and Feautrier techniques. The doubling technique has also been applied to the solution of resonance line scattering problems where the R-parallel redistribution function is appropriate, both neglecting and including polarization as developed by Yelle and Wallace (1989). With the constraint that the atmosphere is illuminated from the zenith, the only difficulty of consequence is that of performing precise frequency integrations over the line profiles. With that problem solved, it is no longer necessary to use the Monte Carlo method to solve this class of problem.

  19. An intangible energy in the functioning biosystem. II: Useful parallels with circuit theory and with non-linear optics.

    PubMed

    Reid, B L

    1995-06-01

    The argument is developed that a structure and function already exists in selected inanimate systems for an intangible energy dissipating these systems and that, in so doing, this energy exhibits certain properties, readily recognised in the functioning biosystem. The central thesis is that, during dissipation, the structure of the biosystem affords opportunity for an enhanced display of these properties, so that this structure can be rationally recognised as obligatory in the transition, inanimate to animate matter. The systems chosen are those of reactance in linear circuit theory of electronics, and some recent developments in non-linear optics, both of which rely on imaginary or quantal force to display observable effects. Discussion occurs on the fashion which the development of a statistical formalism as a basis for the study of squeezed states of light in these non-linear systems, has, at the same time, overcome a long standing veto on the practical use of quantal energy associated with the Uncertainty Principle of Heisenberg. These ideas are used to vindicate the suggestion that a theoretical basis is presently available for an engineering type approach, toward an intangible force as it exists in the biosystem. The origins and properties of such a force continue to be considered by many as immersed in mysticism.

  20. Processes setting the structure of the electron distribution function within the exhausts of anti-parallel reconnection

    NASA Astrophysics Data System (ADS)

    Egedal, J.; Wetherton, B.; Daughton, W.; Le, A.

    2016-12-01

    In situ spacecraft observations within the exhausts of magnetic reconnection document a large variation in the velocity space structure of the electron distribution function. Multiple mechanisms help govern the underlying electron dynamics, yielding a range of signatures for collisionless reconnection. These signatures include passing beams of electrons separated by well-defined boundaries from betatron heated/cooled trapped electrons. The present study emphasizes how localized regions of non-adiabatic electron dynamics can mix electrons across the trapped/passing boundaries and impact the form of the electron distributions in the full width of the exhaust. While our study is based on 2D simulations, the described principles shaping the velocity space distributions also apply to 3D geometries making our findings relevant to spacecraft observation of reconnection in the Earth's magnetosphere.

  1. NOCA-1 functions with γ-tubulin and in parallel to Patronin to assemble non-centrosomal microtubule arrays in C. elegans

    PubMed Central

    Wang, Shaohe; Wu, Di; Quintin, Sophie; Green, Rebecca A; Cheerambathur, Dhanya K; Ochoa, Stacy D; Desai, Arshad; Oegema, Karen

    2015-01-01

    Non-centrosomal microtubule arrays assemble in differentiated tissues to perform mechanical and transport-based functions. In this study, we identify Caenorhabditis elegans NOCA-1 as a protein with homology to vertebrate ninein. NOCA-1 contributes to the assembly of non-centrosomal microtubule arrays in multiple tissues. In the larval epidermis, NOCA-1 functions redundantly with the minus end protection factor Patronin/PTRN-1 to assemble a circumferential microtubule array essential for worm growth and morphogenesis. Controlled degradation of a γ-tubulin complex subunit in this tissue revealed that γ-tubulin acts with NOCA-1 in parallel to Patronin/PTRN-1. In the germline, NOCA-1 and γ-tubulin co-localize at the cell surface, and inhibiting either leads to a microtubule assembly defect. γ-tubulin targets independently of NOCA-1, but NOCA-1 targeting requires γ-tubulin when a non-essential putatively palmitoylated cysteine is mutated. These results show that NOCA-1 acts with γ-tubulin to assemble non-centrosomal arrays in multiple tissues and highlight functional overlap between the ninein and Patronin protein families. DOI: http://dx.doi.org/10.7554/eLife.08649.001 PMID:26371552

  2. Massively parallel patterning of complex 2D and 3D functional polymer brushes by polymer pen lithography.

    PubMed

    Xie, Zhuang; Chen, Chaojian; Zhou, Xuechang; Gao, Tingting; Liu, Danqing; Miao, Qian; Zheng, Zijian

    2014-08-13

    We report the first demonstration of centimeter-area serial patterning of complex 2D and 3D functional polymer brushes by high-throughput polymer pen lithography. Arbitrary 2D and 3D structures of poly(glycidyl methacrylate) (PGMA) brushes are fabricated over areas as large as 2 cm × 1 cm, with a remarkable throughput being 3 orders of magnitudes higher than the state-of-the-arts. Patterned PGMA brushes are further employed as resist for fabricating Au micro/nanostructures and hard molds for the subsequent replica molding of soft stamps. On the other hand, these 2D and 3D PGMA brushes are also utilized as robust and versatile platforms for the immobilization of bioactive molecules to form 2D and 3D patterned DNA oligonucleotide and protein chips. Therefore, this low-cost, yet high-throughput "bench-top" serial fabrication method can be readily applied to a wide range of fields including micro/nanofabrication, optics and electronics, smart surfaces, and biorelated studies.

  3. Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction.

    PubMed

    Teramitsu, Ikuko; Kudo, Lili C; London, Sarah E; Geschwind, Daniel H; White, Stephanie A

    2004-03-31

    Humans and songbirds are two of the rare animal groups that modify their innate vocalizations. The identification of FOXP2 as the monogenetic locus of a human speech disorder exhibited by members of the family referred to as KE enables the first examination of whether molecular mechanisms for vocal learning are shared between humans and songbirds. Here, in situ hybridization analyses for FoxP1 and FoxP2 in a songbird reveal a corticostriatal expression pattern congruent with the abnormalities in brain structures of affected KE family members. The overlap in FoxP1 and FoxP2 expression observed in the songbird suggests that combinatorial regulation by these molecules during neural development and within vocal control structures may occur. In support of this idea, we find that FOXP1 and FOXP2 expression patterns in human fetal brain are strikingly similar to those in the songbird, including localization to subcortical structures that function in sensorimotor integration and the control of skilled, coordinated movement. The specific colocalization of FoxP1 and FoxP2 found in several structures in the bird and human brain predicts that mutations in FOXP1 could also be related to speech disorders.

  4. Massively Parallel Genetics.

    PubMed

    Shendure, Jay; Fields, Stanley

    2016-06-01

    Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance."

  5. Parallel pivoting combined with parallel reduction

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita

    1987-01-01

    Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.

  6. Parallel Programming in the Age of Ubiquitous Parallelism

    NASA Astrophysics Data System (ADS)

    Pingali, Keshav

    2014-04-01

    Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs

  7. Ultrascalable petaflop parallel supercomputer

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  8. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    NASA Astrophysics Data System (ADS)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering

  9. The electron signature of parallel electric fields

    NASA Astrophysics Data System (ADS)

    Burch, J. L.; Gurgiolo, C.; Menietti, J. D.

    1990-12-01

    Dynamics Explorer I High-Altitude Plasma Instrument electron data are presented. The electron distribution functions have characteristics expected of a region of parallel electric fields. The data are consistent with previous test-particle simulations for observations within parallel electric field regions which indicate that typical hole, bump, and loss-cone electron distributions, which contain evidence for parallel potential differences both above and below the point of observation, are not expected to occur in regions containing actual parallel electric fields.

  10. Development and study of a parallel algorithm of iteratively forming latent functionally-determined structures for classification and analysis of meteorological data

    NASA Astrophysics Data System (ADS)

    Sorokin, V. A.; Volkov, Yu V.; Sherstneva, A. I.; Botygin, I. A.

    2016-11-01

    This paper overviews a method of generating climate regions based on an analytic signal theory. When applied to atmospheric surface layer temperature data sets, the method allows forming climatic structures with the corresponding changes in the temperature to make conclusions on the uniformity of climate in an area and to trace the climate changes in time by analyzing the type group shifts. The algorithm is based on the fact that the frequency spectrum of the thermal oscillation process is narrow-banded and has only one mode for most weather stations. This allows using the analytic signal theory, causality conditions and introducing an oscillation phase. The annual component of the phase, being a linear function, was removed by the least squares method. The remaining phase fluctuations allow consistent studying of their coordinated behavior and timing, using the Pearson correlation coefficient for dependence evaluation. This study includes program experiments to evaluate the calculation efficiency in the phase grouping task. The paper also overviews some single-threaded and multi-threaded computing models. It is shown that the phase grouping algorithm for meteorological data can be parallelized and that a multi-threaded implementation leads to a 25-30% increase in the performance.

  11. SDHA loss-of-function mutations in KIT-PDGFRA wild-type gastrointestinal stromal tumors identified by massively parallel sequencing.

    PubMed

    Pantaleo, Maria A; Astolfi, Annalisa; Indio, Valentina; Moore, Richard; Thiessen, Nina; Heinrich, Michael C; Gnocchi, Chiara; Santini, Donatella; Catena, Fausto; Formica, Serena; Martelli, Pier Luigi; Casadio, Rita; Pession, Andrea; Biasco, Guido

    2011-06-22

    Approximately 10%-15% of gastrointestinal stromal tumors (GISTs) in adults do not harbor any mutation in the KIT or PDGFRA genes (ie, KIT/PDGFRA wild-type GISTs). Recently, mutations in SDHB and SDHC (which encode succinate dehydrogenase subunits B and C, respectively) but not in SDHA and SDHD (which encode subunits A and D, respectively) were identified in KIT/PDGFRA wild-type GISTs. To search for novel pathogenic mutations, we sequenced the tumor transcriptome of two young adult patients who developed sporadic KIT/PDGFRA wild-type GISTs by using a massively parallel sequencing approach. The only variants identified as disease related by computational analysis were in SDHA. One patient carried the homozygous nonsense mutation p.Ser384X, the other patient was a compound heterozygote harboring a p.Arg31X nonsense mutation and a p.Arg589Trp missense mutation. The heterozygous nonsense mutations in both patients were present in germline DNA isolated from peripheral blood. Protein structure analysis indicates that all three mutations lead to functional inactivation of the protein. This is the first report, to our knowle dge, that identifies SDHA inactivation as a common oncogenic event in GISTs that lack a mutation in KIT and PDGFRA.

  12. Executive functioning as a mediator of conduct problems prevention in children of homeless families residing in temporary supportive housing: a parallel process latent growth modeling approach.

    PubMed

    Piehler, Timothy F; Bloomquist, Michael L; August, Gerald J; Gewirtz, Abigail H; Lee, Susanne S; Lee, Wendy S C

    2014-01-01

    A culturally diverse sample of formerly homeless youth (ages 6-12) and their families (n = 223) participated in a cluster randomized controlled trial of the Early Risers conduct problems prevention program in a supportive housing setting. Parents provided 4 annual behaviorally-based ratings of executive functioning (EF) and conduct problems, including at baseline, over 2 years of intervention programming, and at a 1-year follow-up assessment. Using intent-to-treat analyses, a multilevel latent growth model revealed that the intervention group demonstrated reduced growth in conduct problems over the 4 assessment points. In order to examine mediation, a multilevel parallel process latent growth model was used to simultaneously model growth in EF and growth in conduct problems along with intervention status as a covariate. A significant mediational process emerged, with participation in the intervention promoting growth in EF, which predicted negative growth in conduct problems. The model was consistent with changes in EF fully mediating intervention-related changes in youth conduct problems over the course of the study. These findings highlight the critical role that EF plays in behavioral change and lends further support to its importance as a target in preventive interventions with populations at risk for conduct problems.

  13. Special parallel processing workshop

    SciTech Connect

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  14. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  15. Matpar: Parallel Extensions for MATLAB

    NASA Technical Reports Server (NTRS)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  16. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  17. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  18. Distinguishing serial and parallel parsing.

    PubMed

    Gibson, E; Pearlmutter, N J

    2000-03-01

    This paper discusses ways of determining whether the human parser is serial maintaining at most, one structural interpretation at each parse state, or whether it is parallel, maintaining more than one structural interpretation in at least some circumstances. We make four points. The first two counterclaims made by Lewis (2000): (1) that the availability of alternative structures should not vary as a function of the disambiguating material in some ranked parallel models; and (2) that parallel models predict a slow down during the ambiguous region for more syntactically ambiguous structures. Our other points concern potential methods for seeking experimental evidence relevant to the serial/parallel question. We discuss effects of the plausibility of a secondary structure in the ambiguous region (Pearlmutter & Mendelsohn, 1999) and suggest examining the distribution of reaction times in the disambiguating region.

  19. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  20. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  1. Parallel processing ITS

    SciTech Connect

    Fan, W.C.; Halbleib, J.A. Sr.

    1996-09-01

    This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.

  2. Research in parallel computing

    NASA Technical Reports Server (NTRS)

    Ortega, James M.; Henderson, Charles

    1994-01-01

    This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.

  3. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  4. Multi-slice parallel transmission three-dimensional tailored RF (PTX 3DTRF) pulse design for signal recovery in ultra high field functional MRI

    NASA Astrophysics Data System (ADS)

    Zheng, Hai; Zhao, Tiejun; Qian, Yongxian; Schirda, Claudiu; Ibrahim, Tamer S.; Boada, Fernando E.

    2013-03-01

    T2∗ weighted fMRI at high and ultra high field (UHF) is often hampered by susceptibility-induced, through-plane, signal loss. Three-dimensional tailored RF (3DTRF) pulses have been shown to be an effective approach for mitigating through-plane signal loss at UHF. However, the required RF pulse lengths are too long for practical applications. Recently, parallel transmission (PTX) has emerged as a very effective means for shortening the RF pulse duration for 3DTRF without sacrificing the excitation performance. In this article, we demonstrate a RF pulse design strategy for 3DTRF based on the use of multi-slice PTX 3DTRF to simultaneously and precisely recover signal with whole-brain coverage. Phantom and human experiments are used to demonstrate the effectiveness and robustness of the proposed method on three subjects using an eight-channel whole body parallel transmission system.

  5. Parallel algorithm development

    SciTech Connect

    Adams, T.F.

    1996-06-01

    Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.

  6. Parallel Implicit Algorithms for CFD

    NASA Technical Reports Server (NTRS)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  7. Parallel Atomistic Simulations

    SciTech Connect

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  8. Linearly exact parallel closures for slab geometry

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-01

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).

  9. Parallel digital forensics infrastructure.

    SciTech Connect

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  10. Introduction to Parallel Computing

    DTIC Science & Technology

    1992-05-01

    Topology C, Ada, C++, Data-parallel FORTRAN, 2D mesh of node boards, each node FORTRAN-90 (late 1992) board has 1 application processor Devopment Tools ...parallel machines become the wave of the present, tools are increasingly needed to assist programmers in creating parallel tasks and coordinating...their activities. Linda was designed to be such a tool . Linda was designed with three important goals in mind: to be portable, efficient, and easy to use

  11. Parallel Wolff Cluster Algorithms

    NASA Astrophysics Data System (ADS)

    Bae, S.; Ko, S. H.; Coddington, P. D.

    The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.

  12. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  13. Application Portable Parallel Library

    NASA Technical Reports Server (NTRS)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  14. Parallel Algorithms and Patterns

    SciTech Connect

    Robey, Robert W.

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  15. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  16. Social Problems and Deviance: Some Parallel Issues

    ERIC Educational Resources Information Center

    Kitsuse, John I.; Spector, Malcolm

    1975-01-01

    Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)

  17. Parallel Lisp simulator

    SciTech Connect

    Weening, J.S.

    1988-05-01

    CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.

  18. Parallel and Distributed Computing.

    DTIC Science & Technology

    1986-12-12

    program was devoted to parallel and distributed computing . Support for this part of the program was obtained from the present Army contract and a...Umesh Vazirani. A workshop on parallel and distributed computing was held from May 19 to May 23, 1986 and drew 141 participants. Keywords: Mathematical programming; Protocols; Randomized algorithms. (Author)

  19. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  20. Totally parallel multilevel algorithms

    NASA Technical Reports Server (NTRS)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  1. Parallel computing works

    SciTech Connect

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  2. Trajectories in parallel optics.

    PubMed

    Klapp, Iftach; Sochen, Nir; Mendlovic, David

    2011-10-01

    In our previous work we showed the ability to improve the optical system's matrix condition by optical design, thereby improving its robustness to noise. It was shown that by using singular value decomposition, a target point-spread function (PSF) matrix can be defined for an auxiliary optical system, which works parallel to the original system to achieve such an improvement. In this paper, after briefly introducing the all optics implementation of the auxiliary system, we show a method to decompose the target PSF matrix. This is done through a series of shifted responses of auxiliary optics (named trajectories), where a complicated hardware filter is replaced by postprocessing. This process manipulates the pixel confined PSF response of simple auxiliary optics, which in turn creates an auxiliary system with the required PSF matrix. This method is simulated on two space variant systems and reduces their system condition number from 18,598 to 197 and from 87,640 to 5.75, respectively. We perform a study of the latter result and show significant improvement in image restoration performance, in comparison to a system without auxiliary optics and to other previously suggested hybrid solutions. Image restoration results show that in a range of low signal-to-noise ratio values, the trajectories method gives a significant advantage over alternative approaches. A third space invariant study case is explored only briefly, and we present a significant improvement in the matrix condition number from 1.9160e+013 to 34,526.

  3. Parallel Computational Protein Design

    PubMed Central

    Zhou, Yichao; Donald, Bruce R.; Zeng, Jianyang

    2016-01-01

    Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab [1] to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE [2] and DEEPer [3] to also consider continuous backbone and side-chain flexibility. PMID:27914056

  4. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  5. PALM: a Parallel Dynamic Coupler

    NASA Astrophysics Data System (ADS)

    Thevenin, A.; Morel, T.

    2008-12-01

    In order to efficiently represent complex systems, numerical modeling has to rely on many physical models at a time: an ocean model coupled with an atmospheric model is at the basis of climate modeling. The continuity of the solution is granted only if these models can constantly exchange information. PALM is a coupler allowing the concurrent execution and the intercommunication of programs not having been especially designed for that. With PALM, the dynamic coupling approach is introduced: a coupled component can be launched and can release computers' resources upon termination at any moment during the simulation. In order to exploit as much as possible computers' possibilities, the PALM coupler handles two levels of parallelism. The first level concerns the components themselves. While managing the resources, PALM allocates the number of processes which are necessary to any coupled component. These models can be parallel programs based on domain decomposition with MPI or applications multithreaded with OpenMP. The second level of parallelism is a task parallelism: one can define a coupling algorithm allowing two or more programs to be executed in parallel. PALM applications are implemented via a Graphical User Interface called PrePALM. In this GUI, the programmer initially defines the coupling algorithm then he describes the actual communications between the models. PALM offers a very high flexibility for testing different coupling techniques and for reaching the best load balance in a high performance computer. The transformation of computational independent code is almost straightforward. The other qualities of PALM are its easy set-up, its flexibility, its performances, the simple updates and evolutions of the coupled application and the many side services and functions that it offers.

  6. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  7. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-09-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, a set of tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory at info.mcs.anl.gov.

  8. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  9. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  10. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  11. Revisiting and parallelizing SHAKE

    NASA Astrophysics Data System (ADS)

    Weinbach, Yael; Elber, Ron

    2005-10-01

    An algorithm is presented for running SHAKE in parallel. SHAKE is a widely used approach to compute molecular dynamics trajectories with constraints. An essential step in SHAKE is the solution of a sparse linear problem of the type Ax = b, where x is a vector of unknowns. Conjugate gradient minimization (that can be done in parallel) replaces the widely used iteration process that is inherently serial. Numerical examples present good load balancing and are limited only by communication time.

  12. The Function of the Glutamate-Nitric Oxide-cGMP Pathway in Brain in Vivo and Learning Ability Decrease in Parallel in Mature Compared with Young Rats

    ERIC Educational Resources Information Center

    Piedrafita, Blanca; Cauli, Omar; Montoliu, Carmina; Felipo, Vicente

    2007-01-01

    Aging is associated with cognitive impairment, but the underlying mechanisms remain unclear. We have recently reported that the ability of rats to learn a Y-maze conditional discrimination task depends on the function of the glutamate-nitric oxide-cGMP pathway in brain. The aims of the present work were to assess whether the ability of rats to…

  13. Processing Semblances Induced through Inter-Postsynaptic Functional LINKs, Presumed Biological Parallels of K-Lines Proposed for Building Artificial Intelligence.

    PubMed

    Vadakkan, Kunjumon I

    2011-01-01

    The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation - namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky's K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system.

  14. Expression of mitochondrial regulatory genes parallels respiratory capacity and contractile function in a rat model of hypoxia-induced right ventricular hypertrophy

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chronic hypobaric hypoxia (CHH) increases load on the right ventricle (RV) resulting in RV hypertrophy. We hypothesized that CHH elicits distinct responses, i.e., the hypertrophied RV, unlike the left ventricle (LV), displaying enhanced mitochondrial respiratory and contractile function. Wistar rats...

  15. Processing Semblances Induced through Inter-Postsynaptic Functional LINKs, Presumed Biological Parallels of K-Lines Proposed for Building Artificial Intelligence

    PubMed Central

    Vadakkan, Kunjumon I.

    2011-01-01

    The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation – namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky’s K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system. PMID:21845180

  16. Unique roles of glucagon and glucagon-like peptides: Parallels in understanding the functions of adipokinetic hormones in stress responses in insects.

    PubMed

    Bednářová, Andrea; Kodrík, Dalibor; Krishnan, Natraj

    2013-01-01

    Glucagon is conventionally regarded as a hormone, counter regulatory in function to insulin and plays a critical anti-hypoglycemic role by maintaining glucose homeostasis in both animals and humans. Glucagon performs this function by increasing hepatic glucose output to the blood by stimulating glycogenolysis and gluconeogenesis in response to starvation. Additionally it plays a homeostatic role by decreasing glycogenesis and glycolysis in tandem to try and maintain optimal glucose levels. To perform this action, it also increases energy expenditure which is contrary to what one would expect and has actions which are unique and not entirely in agreement with its role in protection from hypoglycemia. Interestingly, glucagon-like peptides (GLP-1 and GLP-2) from the major fragment of proglucagon (in non-mammalian vertebrates, as well as in mammals) may also modulate response to stress in addition to their other physiological actions. These unique modes of action occur in response to psychological, metabolic and other stress situations and mirror the role of adipokinetic hormones (AKHs) in insects which perform a similar function. The findings on the anti-stress roles of glucagon and glucagon-like peptides in mammalian and non-mammalian vertebrates may throw light on the multiple stress responsive mechanisms which operate in a concerted manner under regulation by AKH in insects thus functioning as a stress responsive hormone while also maintaining organismal homeostasis.

  17. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect

    Gray, Genetha Anne.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica L.

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  18. Parallel reduction in expression, but no loss of functional constraint, in two opsin paralogs within cave populations of Gammarus minus (Crustacea: Amphipoda)

    PubMed Central

    2013-01-01

    Background Gammarus minus, a freshwater amphipod living in the cave and surface streams in the eastern USA, is a premier candidate for studying the evolution of troglomorphic traits such as pigmentation loss, elongated appendages, and reduced eyes. In G. minus, multiple pairs of genetically related, physically proximate cave and surface populations exist which exhibit a high degree of intraspecific morphological divergence. The morphology, ecology, and genetic structure of these sister populations are well characterized, yet the genetic basis of their morphological divergence remains unknown. Results We used degenerate PCR primers designed to amplify opsin genes within the subphylum Crustacea and discovered two distinct opsin paralogs (average inter-paralog protein divergence ≈ 20%) in the genome of three independently derived pairs of G. minus cave and surface populations. Both opsin paralogs were found to be related to other crustacean middle wavelength sensitive opsins. Low levels of nucleotide sequence variation (< 1% within populations) were detected in both opsin genes, regardless of habitat, and dN/dS ratios did not indicate a relaxation of functional constraint in the cave populations with reduced or absent eyes. Maximum likelihood analyses using codon-based models also did not detect a relaxation of functional constraint in the cave lineages. We quantified expression level of both opsin genes and found that the expression of both paralogs was significantly reduced in all three cave populations relative to their sister surface populations. Conclusions The concordantly lowered expression level of both opsin genes in cave populations of G. minus compared to sister surface populations, combined with evidence for persistent purifying selection in the cave populations, is consistent with an unspecified pleiotropic function of opsin proteins. Our results indicate that phototransduction proteins such as opsins may have retained their function in cave

  19. Parallel effects of β-adrenoceptor blockade on cardiac function and fatty acid oxidation in the diabetic heart: Confronting the maze

    PubMed Central

    Sharma, Vijay; McNeill, John H

    2011-01-01

    Diabetic cardiomyopathy is a disease process in which diabetes produces a direct and continuous myocardial insult even in the absence of ischemic, hypertensive or valvular disease. The β-blocking agents bisoprolol, carvedilol and metoprolol have been shown in large-scale randomized controlled trials to reduce heart failure mortality. In this review, we summarize the results of our studies investigating the effects of β-blocking agents on cardiac function and metabolism in diabetic heart failure, and the complex inter-related mechanisms involved. Metoprolol inhibits fatty acid oxidation at the mitochondrial level but does not prevent lipotoxicity; its beneficial effects are more likely to be due to pro-survival effects of chronic treatment. These studies have expanded our understanding of the range of effects produced by β-adrenergic blockade and show how interconnected the signaling pathways of function and metabolism are in the heart. Although our initial hypothesis that inhibition of fatty acid oxidation would be a key mechanism of action was disproved, unexpected results led us to some intriguing regulatory mechanisms of cardiac metabolism. The first was upstream stimulatory factor-2-mediated repression of transcriptional master regulator PGC-1α, most likely occurring as a consequence of the improved function; it is unclear whether this effect is unique to β-blockers, although repression of carnitine palmitoyltransferase (CPT)-1 has not been reported with other drugs which improve function. The second was the identification of a range of covalent modifications which can regulate CPT-1 directly, mediated by a signalome at the level of the mitochondria. We also identified an important interaction between β-adrenergic signaling and caveolins, which may be a key mechanism of action of β-adrenergic blockade. Our experience with this labyrinthine signaling web illustrates that initial hypotheses and anticipated directions do not have to be right in order to

  20. Parallelism in integrated fluidic circuits

    NASA Astrophysics Data System (ADS)

    Bousse, Luc J.; Kopf-Sill, Anne R.; Parce, J. W.

    1998-04-01

    Many research groups around the world are working on integrated microfluidics. The goal of these projects is to automate and integrate the handling of liquid samples and reagents for measurement and assay procedures in chemistry and biology. Ultimately, it is hoped that this will lead to a revolution in chemical and biological procedures similar to that caused in electronics by the invention of the integrated circuit. The optimal size scale of channels for liquid flow is determined by basic constraints to be somewhere between 10 and 100 micrometers . In larger channels, mixing by diffusion takes too long; in smaller channels, the number of molecules present is so low it makes detection difficult. At Caliper, we are making fluidic systems in glass chips with channels in this size range, based on electroosmotic flow, and fluorescence detection. One application of this technology is rapid assays for drug screening, such as enzyme assays and binding assays. A further challenge in this area is to perform multiple functions on a chip in parallel, without a large increase in the number of inputs and outputs. A first step in this direction is a fluidic serial-to-parallel converter. Fluidic circuits will be shown with the ability to distribute an incoming serial sample stream to multiple parallel channels.

  1. Parallel Environment for Quantum Computing

    NASA Astrophysics Data System (ADS)

    Tabakin, Frank; Diaz, Bruno Julia

    2009-03-01

    To facilitate numerical study of noise and decoherence in QC algorithms,and of the efficacy of error correction schemes, we have developed a Fortran 90 quantum computer simulator with parallel processing capabilities. It permits rapid evaluation of quantum algorithms for a large number of qubits and for various ``noise'' scenarios. State vectors are distributed over many processors, to employ a large number of qubits. Parallel processing is implemented by the Message-Passing Interface protocol. A description of how to spread the wave function components over many processors, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors will be delineated.Grover's search and Shor's factoring algorithms with noise will be discussed as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to diverse noise effects, corresponding to solving a stochastic Schrodinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its associated entropy. Applications of this powerful tool is made to delineate the stability and correction of QC processes using Hamiltonian based dynamics.

  2. Parallel architectures for vision

    SciTech Connect

    Maresca, M. ); Lavin, M.A. ); Li, H. )

    1988-08-01

    Vision computing involves the execution of a large number of operations on large sets of structured data. Sequential computers cannot achieve the speed required by most of the current applications and therefore parallel architectural solutions have to be explored. In this paper the authors examine the options that drive the design of a vision oriented computer, starting with the analysis of the basic vision computation and communication requirements. They briefly review the classical taxonomy for parallel computers, based on the multiplicity of the instruction and data stream, and apply a recently proposed criterion, the degree of autonomy of each processor, to further classify fine-grain SIMD massively parallel computers. They identify three types of processor autonomy, namely operation autonomy, addressing autonomy, and connection autonomy. For each type they give the basic definitions and show some examples. They focus on the concept of connection autonomy, which they believe is a key point in the development of massively parallel architectures for vision. They show two examples of parallel computers featuring different types of connection autonomy - the Connection Machine and the Polymorphic-Torus - and compare their cost and benefit.

  3. Sublattice parallel replica dynamics

    NASA Astrophysics Data System (ADS)

    Martínez, Enrique; Uberuaga, Blas P.; Voter, Arthur F.

    2014-06-01

    Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998), 10.1103/PhysRevB.57.R13985] by combining it with the synchronous sublattice approach of Shim and Amar [Y. Shim and J. G. Amar, Phys. Rev. B 71, 125432 (2005), 10.1103/PhysRevB.71.125432], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.

  4. Parallel optical sampler

    DOEpatents

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  5. Parallel re-modeling of EF-1α function: divergent EF-1α genes co-occur with EFL genes in diverse distantly related eukaryotes

    PubMed Central

    2013-01-01

    Background Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a ‘differential loss’ hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species). Results In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes. Conclusions According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes. PMID:23800323

  6. CRUNCH_PARALLEL

    SciTech Connect

    Shumaker, Dana E.; Steefel, Carl I.

    2016-06-21

    The code CRUNCH_PARALLEL is a parallel version of the CRUNCH code. CRUNCH code version 2.0 was previously released by LLNL, (UCRL-CODE-200063). Crunch is a general purpose reactive transport code developed by Carl Steefel and Yabusake (Steefel Yabsaki 1996). The code handles non-isothermal transport and reaction in one, two, and three dimensions. The reaction algorithm is generic in form, handling an arbitrary number of aqueous and surface complexation as well as mineral dissolution/precipitation. A standardized database is used containing thermodynamic and kinetic data. The code includes advective, dispersive, and diffusive transport.

  7. The NAS Parallel Benchmarks

    SciTech Connect

    Bailey, David H.

    2009-11-15

    The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage

  8. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.

  9. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  10. Two-Axis Acceleration of Functional Connectivity Magnetic Resonance Imaging by Parallel Excitation of Phase-Tagged Slices and Half k-Space Acceleration

    PubMed Central

    Jesmanowicz, Andrzej; Nencka, Andrew S.; Li, Shi-Jiang

    2011-01-01

    Abstract Whole brain functional connectivity magnetic resonance imaging requires acquisition of a time course of gradient-recalled (GR) volumetric images. A method is developed to accelerate this acquisition using GR echo-planar imaging and radio frequency (RF) slice phase tagging. For N-fold acceleration, a tailored RF pulse excites N slices using a uniform-field transmit coil. This pulse is the Fourier transform of the profile for the N slices with a predetermined RF phase tag on each slice. A multichannel RF receive coil is used for detection. For n slices, there are n/N groups of slices. Signal-averaged reference images are created for each slice within each slice group for each member of the coil array and used to separate overlapping images that are simultaneously received. The time-overhead for collection of reference images is small relative to the acquisition time of a complete volumetric time course. A least-squares singular value decomposition method allows image separation on a pixel-by-pixel basis. Twofold slice acceleration is demonstrated using an eight-channel RF receive coil, with application to resting-state functional magnetic resonance imaging in the human brain. Data from six subjects at 3 T are reported. The method has been extended to half k-space acquisition, which not only provides additional acceleration, but also facilitates slice separation because of increased signal intensity of the central lines of k-space coupled with reduced susceptibility effects. PMID:22432957

  11. Parallel Coordinate Axes.

    ERIC Educational Resources Information Center

    Friedlander, Alex; And Others

    1982-01-01

    Several methods of numerical mappings other than the usual cartesian coordinate system are considered. Some examples using parallel axes representation, which are seen to lead to aesthetically pleasing or interesting configurations, are presented. Exercises with alternative representations can stimulate pupil imagination and exploration in…

  12. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  13. Parallel Dislocation Simulator

    SciTech Connect

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  14. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (Inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  15. High performance parallel architectures

    SciTech Connect

    Anderson, R.E. )

    1989-09-01

    In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.

  16. Parallel fast gauss transform

    SciTech Connect

    Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan

    2010-01-01

    We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.

  17. Parallel hierarchical radiosity rendering

    SciTech Connect

    Carter, M.

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  18. Parallel hierarchical global illumination

    SciTech Connect

    Snell, Quinn O.

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  19. Parallel Multigrid Equation Solver

    SciTech Connect

    Adams, Mark

    2001-09-07

    Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.

  20. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  1. Suggestions to Reduce Clinical Fibromyalgia Pain and Experimentally Induced Pain Produce Parallel Effects on Perceived Pain but Divergent Functional MRI–Based Brain Activity

    PubMed Central

    Derbyshire, Stuart W.G.; Whalley, Matthew G.; Seah, Stanley T.H.; Oakley, David A.

    2017-01-01

    ABSTRACT Objective Hypnotic suggestion is an empirically validated form of pain control; however, the underlying mechanism remains unclear. Methods Thirteen fibromyalgia patients received suggestions to alter their clinical pain, and 15 healthy controls received suggestions to alter experimental heat pain. Suggestions were delivered before and after hypnotic induction with blood oxygen level–dependent (BOLD) activity measured concurrently. Results Across groups, suggestion produced substantial changes in pain report (main effect of suggestion, F2, 312 = 585.8; p < .0001), with marginally larger changes after induction (main effect of induction, F1, 312 = 3.6; p = .060). In patients, BOLD response increased with pain report in regions previously associated with pain, including thalamus and anterior cingulate cortex. In controls, BOLD response decreased with pain report. All changes were greater after induction. Region-of-interest analysis revealed largely linear patient responses with increasing pain report. Control responses, however, were higher after suggestion to increase or decrease pain from baseline. Conclusions Based on behavioral report alone, the mechanism of suggestion could be interpreted as largely similar regardless of the induction or type of pain experience. The functional magnetic resonance imaging data, however, demonstrated larger changes in brain activity after induction and a radically different pattern of brain activity for clinical pain compared with experimental pain. These findings imply that induction has an important effect on underlying neural activity mediating the effects of suggestion, and the mechanism of suggestion in patients altering clinical pain differs from that in controls altering experimental pain. Patient responses imply that suggestions altered pain experience via corresponding changes in pain-related brain regions, whereas control responses imply suggestion engaged cognitive control. PMID:27490850

  2. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  3. Hybrid Optimization Parallel Search PACKage

    SciTech Connect

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.

  4. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  5. Parallel Anisotropic Tetrahedral Adaptation

    NASA Technical Reports Server (NTRS)

    Park, Michael A.; Darmofal, David L.

    2008-01-01

    An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.

  6. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  7. Parallel multilevel preconditioners

    SciTech Connect

    Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.

    1989-01-01

    In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.

  8. Homology, convergence and parallelism

    PubMed Central

    Ghiselin, Michael T.

    2016-01-01

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721

  9. Parallel unstructured grid generation

    NASA Technical Reports Server (NTRS)

    Loehner, Rainald; Camberos, Jose; Merriam, Marshal

    1991-01-01

    A parallel unstructured grid generation algorithm is presented and implemented on the Hypercube. Different processor hierarchies are discussed, and the appropraite hierarchies for mesh generation and mesh smoothing are selected. A domain-splitting algorithm for unstructured grids which tries to minimize the surface-to-volume ratio of each subdomain is described. This splitting algorithm is employed both for grid generation and grid smoothing. Results obtained on the Hypercube demonstrate the effectiveness of the algorithms developed.

  10. Development of Parallel GSSHA

    DTIC Science & Technology

    2013-09-01

    C en te r Paul R. Eller , Jing-Ru C. Cheng, Aaron R. Byrd, Charles W. Downer, and Nawa Pradhan September 2013 Approved for public release...Program ERDC TR-13-8 September 2013 Development of Parallel GSSHA Paul R. Eller and Jing-Ru C. Cheng Information Technology Laboratory US Army Engineer...5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Paul Eller , Ruth Cheng, Aaron Byrd, Chuck Downer, and Nawa Pradhan 5d. PROJECT NUMBER

  11. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  12. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Painter, J.; Hansen, C.

    1996-10-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.

  13. Implementation of Parallel Algorithms

    DTIC Science & Technology

    1993-06-30

    their socia ’ relations or to achieve some goals. For example, we define a pair-wise force law of i epulsion and attraction for a group of identical...quantization based compression schemes. Photo-refractive crystals, which provide high density recording in real time, are used as our holographic media . The...of Parallel Algorithms (J. Reif, ed.). Kluwer Academic Pu’ ishers, 1993. (4) "A Dynamic Separator Algorithm", D. Armon and J. Reif. To appear in

  14. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  15. Trajectory optimization using parallel shooting method on parallel computer

    SciTech Connect

    Wirthman, D.J.; Park, S.Y.; Vadali, S.R.

    1995-03-01

    The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.

  16. Parallel Function Strategy in Pronoun Assignment

    ERIC Educational Resources Information Center

    Grober, Ellen H.; And Others

    1978-01-01

    Subjects completed sentences of the form NP1 aux V NP2 because (but) Pro...(e.g., John may scold Bill because he...) with a reason or motive for the action described. A basic perceptual strategy was hypothesized to underlie the comprehension of these sentences which have a potentially ambiguous pronoun in the subject position of the subordinate…

  17. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  18. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  19. Asynchronous interpretation of parallel microprograms

    SciTech Connect

    Bandman, O.L.

    1984-03-01

    In this article, the authors demonstrate how to pass from a given synchronous interpretation of a parallel microprogram to an equivalent asynchronous interpretation, and investigate the cost associated with the rejection of external synchronization in parallel microprogram structures.

  20. Status of TRANSP Parallel Services

    NASA Astrophysics Data System (ADS)

    Indireshkumar, K.; Andre, Robert; McCune, Douglas; Randerson, Lewis

    2006-10-01

    The PPPL TRANSP code suite has been used successfully over many years to carry out time dependent simulations of tokamak plasmas. However, accurately modeling certain phenomena such as RF heating and fast ion behavior using TRANSP requires extensive computational power and will benefit from parallelization. Parallelizing all of TRANSP is not required and parts will run sequentially while other parts run parallelized. To efficiently use a site's parallel services, the parallelized TRANSP modules are deployed to a shared ``parallel service'' on a separate cluster. The PPPL Monte Carlo fast ion module NUBEAM and the MIT RF module TORIC are the first TRANSP modules to be so deployed. This poster will show the performance scaling of these modules within the parallel server. Communications between the serial client and the parallel server will be described in detail, and measurements of startup and communications overhead will be shown. Physics modeling benefits for TRANSP users will be assessed.

  1. Resistor Combinations for Parallel Circuits.

    ERIC Educational Resources Information Center

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  2. Global Arrays Parallel Programming Toolkit

    SciTech Connect

    Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel

    2011-01-01

    The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving

  3. SMM parallel battery operation in orbit

    NASA Technical Reports Server (NTRS)

    Broderick, R.

    1982-01-01

    A parallel battery system for the SMM spacecraft is described. The battery system performance as a function of lifetime over orbit was evaluated. The following equipment performance specifications were examined during a typical orbit: battery current and discharges, voltage limitations, battery temperature variations, and current sensor performance. Tabulated battery performance data is also included.

  4. Parallel Tree Contraction and Its Application.

    DTIC Science & Technology

    1985-12-01

    observed by Uspensky [231, see 112]. These bounds are commonly known as Chernoff bounds 16J. We shall use the following simply stated bounds [3. Theorem 6...Functions in Logarithmic Parallel Time. 25th Annual Symp. on Foundations of Computer Science, IEEE, 1984, pp. 12-22. 22. J. Uspensky . Introduction to

  5. Parallel Debugging Using Graphical Views

    DTIC Science & Technology

    1988-03-01

    Voyeur , a prototype system for creating graphical views of parallel programs, provid(s a cost-effective way to construct such views for any parallel...programming system. We illustrate Voyeur by discussing four views created for debugging Poker programs. One is a vteneral trace facility for any Poker...Graphical views are essential for debugging parallel programs because of the large quan- tity of state information contained in parallel programs. Voyeur

  6. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  7. CSM parallel structural methods research

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1989-01-01

    Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.

  8. Roo: A parallel theorem prover

    SciTech Connect

    Lusk, E.L.; McCune, W.W.; Slaney, J.K.

    1991-11-01

    We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.

  9. Parallel Eclipse Project Checkout

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

    2011-01-01

    Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any

  10. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.

    1995-05-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.

  11. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  12. Programming parallel architectures - The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  13. Constructing higher order DNA origami arrays using DNA junctions of anti-parallel/parallel double crossovers

    NASA Astrophysics Data System (ADS)

    Ma, Zhipeng; Park, Seongsu; Yamashita, Naoki; Kawai, Kentaro; Hirai, Yoshikazu; Tsuchiya, Toshiyuki; Tabata, Osamu

    2016-06-01

    DNA origami provides a versatile method for the construction of nanostructures with defined shape, size and other properties; such nanostructures may enable a hierarchical assembly of large scale architecture for the placement of other nanomaterials with atomic precision. However, the effective use of these higher order structures as functional components depends on knowledge of their assembly behavior and mechanical properties. This paper demonstrates construction of higher order DNA origami arrays with controlled orientations based on the formation of two types of DNA junctions: anti-parallel and parallel double crossovers. A two-step assembly process, in which preformed rectangular DNA origami monomer structures themselves undergo further self-assembly to form numerically unlimited arrays, was investigated to reveal the influences of assembly parameters. AFM observations showed that when parallel double crossover DNA junctions are used, the assembly of DNA origami arrays occurs with fewer monomers than for structures formed using anti-parallel double crossovers, given the same assembly parameters, indicating that the configuration of parallel double crossovers is not energetically preferred. However, the direct measurement by AFM force-controlled mapping shows that both DNA junctions of anti-parallel and parallel double crossovers have homogeneous mechanical stability with any part of DNA origami.

  14. Parallelizing quantum circuit synthesis

    NASA Astrophysics Data System (ADS)

    Di Matteo, Olivia; Mosca, Michele

    2016-03-01

    Quantum circuit synthesis is the process in which an arbitrary unitary operation is decomposed into a sequence of gates from a universal set, typically one which a quantum computer can implement both efficiently and fault-tolerantly. As physical implementations of quantum computers improve, the need is growing for tools that can effectively synthesize components of the circuits and algorithms they will run. Existing algorithms for exact, multi-qubit circuit synthesis scale exponentially in the number of qubits and circuit depth, leaving synthesis intractable for circuits on more than a handful of qubits. Even modest improvements in circuit synthesis procedures may lead to significant advances, pushing forward the boundaries of not only the size of solvable circuit synthesis problems, but also in what can be realized physically as a result of having more efficient circuits. We present a method for quantum circuit synthesis using deterministic walks. Also termed pseudorandom walks, these are walks in which once a starting point is chosen, its path is completely determined. We apply our method to construct a parallel framework for circuit synthesis, and implement one such version performing optimal T-count synthesis over the Clifford+T gate set. We use our software to present examples where parallelization offers a significant speedup on the runtime, as well as directly confirm that the 4-qubit 1-bit full adder has optimal T-count 7 and T-depth 3.

  15. Parallel ptychographic reconstruction

    PubMed Central

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-01-01

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174

  16. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  17. Applied Parallel Metadata Indexing

    SciTech Connect

    Jacobi, Michael R

    2012-08-01

    The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

  18. A systolic array parallelizing compiler

    SciTech Connect

    Tseng, P.S. )

    1990-01-01

    This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.

  19. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  20. Extending HPF for advanced data parallel applications

    NASA Technical Reports Server (NTRS)

    Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

    1994-01-01

    The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.

  1. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  2. Detecting opportunities for parallel observations on the Hubble Space Telescope

    NASA Technical Reports Server (NTRS)

    Lucks, Michael

    1992-01-01

    The presence of multiple scientific instruments aboard the Hubble Space Telescope provides opportunities for parallel science, i.e., the simultaneous use of different instruments for different observations. Determining whether candidate observations are suitable for parallel execution depends on numerous criteria (some involving quantitative tradeoffs) that may change frequently. A knowledge based approach is presented for constructing a scoring function to rank candidate pairs of observations for parallel science. In the Parallel Observation Matching System (POMS), spacecraft knowledge and schedulers' preferences are represented using a uniform set of mappings, or knowledge functions. Assessment of parallel science opportunities is achieved via composition of the knowledge functions in a prescribed manner. The knowledge acquisition, and explanation facilities of the system are presented. The methodology is applicable to many other multiple criteria assessment problems.

  3. Parallel Computing in SCALE

    SciTech Connect

    DeHart, Mark D; Williams, Mark L; Bowman, Stephen M

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  4. Toward Parallel Document Clustering

    SciTech Connect

    Mogill, Jace A.; Haglin, David J.

    2011-09-01

    A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

  5. Parallel Polarization State Generation

    NASA Astrophysics Data System (ADS)

    She, Alan; Capasso, Federico

    2016-05-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  6. Parallel Polarization State Generation

    PubMed Central

    She, Alan; Capasso, Federico

    2016-01-01

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813

  7. A parallel programming environment supporting multiple data-parallel modules

    SciTech Connect

    Seevers, B.K.; Quinn, M.J. ); Hatcher, P.J. )

    1992-10-01

    We describe a system that allows programmers to take advantage of both control and data parallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream I/O to include intermodule communication channels. The progammer writes each module as a separate data-parallel program, then develops a channel linker specification describing how to connect the modules together. A channel linker we have developed loads the separate modules on the parallel machine and binds the communication channels together as specified. We present performance data that demonstrates a mixed control- and data-parallel solution can yield better performance than a strictly data-parallel solution. The system described currently runs on the Intel iWarp multicomputer.

  8. Parallel imaging microfluidic cytometer.

    PubMed

    Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take.

  9. A parallel algorithm for channel routing on a hypercube

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall; Banerjee, Prithviraj

    1987-01-01

    A new parallel simulated annealing algorithm for channel routing on a P processor hypercube is presented. The basic idea used is to partition a set of tracks equally among processors in the hypercube. In parallel, P/2 pairs of processors perform displacements and exchanges of nets between tracks, compute the changes in cost functions, and accept moves using a parallel annealing criteria. Through the use of a unique distributed data structure, it is possible to minimize message traffic and add versatility and efficiency in a parallel routing tool. The algorithm has been implemented and is being tested on some of the popular channel problems from the literature.

  10. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  11. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  12. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  13. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  14. High Performance Parallel Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek; Kaewpijit, Sinthop

    1998-01-01

    Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.

  15. Parallelization of a blind deconvolution algorithm

    NASA Astrophysics Data System (ADS)

    Matson, Charles L.; Borelli, Kathy J.

    2006-09-01

    Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.

  16. Parallel Adaptive Mesh Refinement

    SciTech Connect

    Diachin, L; Hornung, R; Plassmann, P; WIssink, A

    2005-03-04

    As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the

  17. A Parallel Particle Swarm Optimizer

    DTIC Science & Technology

    2003-01-01

    by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based...concurrent computation. The parallelization of the Particle Swarm Optimization (PSO) algorithm is detailed and its performance and characteristics demonstrated for the biomechanical system identification problem as example.

  18. A Parallel, High-Fidelity Radar Model

    DTIC Science & Technology

    2010-09-01

    UCRL # LLNL-CONF-454075 A Parallel, High-Fidelity Radar Model Matthew Horsley, Benjamin Fasenfest Lawrence Livermore National Laboratory...Network to a collision or satellite break-up event. A high fidelity physics-based radar simulator has been developed for Space Surveillance...applications. This simulator is designed in a modular fashion, where each module describes a particular physical process or radar function (radio wave

  19. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  20. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  1. Parallel contingency statistics with Titan.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2009-09-01

    This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.

  2. Kalman Filter Tracking on Parallel Architectures

    NASA Astrophysics Data System (ADS)

    Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2016-11-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment.

  3. Approximating Functions with Exponential Functions

    ERIC Educational Resources Information Center

    Gordon, Sheldon P.

    2005-01-01

    The possibility of approximating a function with a linear combination of exponential functions of the form e[superscript x], e[superscript 2x], ... is considered as a parallel development to the notion of Taylor polynomials which approximate a function with a linear combination of power function terms. The sinusoidal functions sin "x" and cos "x"…

  4. Parallel NPARC: Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Townsend, S. E.

    1996-01-01

    Version 3 of the NPARC Navier-Stokes code includes support for large-grain (block level) parallelism using explicit message passing between a heterogeneous collection of computers. This capability has the potential for significant performance gains, depending upon the block data distribution. The parallel implementation uses a master/worker arrangement of processes. The master process assigns blocks to workers, controls worker actions, and provides remote file access for the workers. The processes communicate via explicit message passing using an interface library which provides portability to a number of message passing libraries, such as PVM (Parallel Virtual Machine). A Bourne shell script is used to simplify the task of selecting hosts, starting processes, retrieving remote files, and terminating a computation. This script also provides a simple form of fault tolerance. An analysis of the computational performance of NPARC is presented, using data sets from an F/A-18 inlet study and a Rocket Based Combined Cycle Engine analysis. Parallel speedup and overall computational efficiency were obtained for various NPARC run parameters on a cluster of IBM RS6000 workstations. The data show that although NPARC performance compares favorably with the estimated potential parallelism, typical data sets used with previous versions of NPARC will often need to be reblocked for optimum parallel performance. In one of the cases studied, reblocking increased peak parallel speedup from 3.2 to 11.8.

  5. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  6. Parallel integer sorting with medium and fine-scale parallelism

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  7. EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS

    SciTech Connect

    F. PETRINI; W. FENG

    1999-09-01

    We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

  8. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  9. Parallel Architecture For Robotics Computation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1990-01-01

    Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.

  10. Multigrid on massively parallel architectures

    SciTech Connect

    Falgout, R D; Jones, J E

    1999-09-17

    The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.

  11. Quantitative selection and parallel characterization of aptamers

    PubMed Central

    Cho, Minseon; Soo Oh, Seung; Nie, Jeff; Stewart, Ron; Eisenstein, Michael; Chambers, James; Marth, Jamey D.; Walker, Faye; Thomson, James A.; Soh, H. Tom

    2013-01-01

    Aptamers are promising affinity reagents that are potentially well suited for high-throughput discovery, as they are chemically synthesized and discovered via completely in vitro selection processes. Recent advancements in selection, sequencing, and the use of modified bases have improved aptamer quality, but the overall process of aptamer generation remains laborious and low-throughput. This is because binding characterization remains a critical bottleneck, wherein the affinity and specificity of each candidate aptamer are measured individually in a serial manner. To accelerate aptamer discovery, we devised the Quantitative Parallel Aptamer Selection System (QPASS), which integrates microfluidic selection and next-generation sequencing with in situ-synthesized aptamer arrays, enabling simultaneous measurement of affinity and specificity for thousands of candidate aptamers in parallel. After using QPASS to select aptamers for the human cancer biomarker angiopoietin-2 (Ang2), we in situ synthesized arrays of the selected sequences and obtained equilibrium dissociation constants (Kd) for every aptamer in parallel. We thereby identified over a dozen high-affinity Ang2 aptamers, with Kd as low as 20.5 ± 7.3 nM. The same arrays enabled us to quantify binding specificity for these aptamers in parallel by comparing relative binding of differentially labeled target and nontarget proteins, and by measuring their binding affinity directly in complex samples such as undiluted serum. Finally, we show that QPASS offers a compelling avenue for exploring structure−function relationships for large numbers of aptamers in parallel by coupling array-based affinity measurements with next-generation sequencing data to identify nucleotides and motifs within the aptamer that critically affect Ang2 binding. PMID:24167271

  12. IOPA: I/O-aware parallelism adaption for parallel programs

    PubMed Central

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236

  13. Appendix E: Parallel Pascal development system

    NASA Technical Reports Server (NTRS)

    1985-01-01

    The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.

  14. Parallel hierarchical method in networks

    NASA Astrophysics Data System (ADS)

    Malinochka, Olha; Tymchenko, Leonid

    2007-09-01

    This method of parallel-hierarchical Q-transformation offers new approach to the creation of computing medium - of parallel -hierarchical (PH) networks, being investigated in the form of model of neurolike scheme of data processing [1-5]. The approach has a number of advantages as compared with other methods of formation of neurolike media (for example, already known methods of formation of artificial neural networks). The main advantage of the approach is the usage of multilevel parallel interaction dynamics of information signals at different hierarchy levels of computer networks, that enables to use such known natural features of computations organization as: topographic nature of mapping, simultaneity (parallelism) of signals operation, inlaid cortex, structure, rough hierarchy of the cortex, spatially correlated in time mechanism of perception and training [5].

  15. New NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

    1997-01-01

    NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

  16. "Feeling" Series and Parallel Resistances.

    ERIC Educational Resources Information Center

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  17. Demonstrating Forces between Parallel Wires.

    ERIC Educational Resources Information Center

    Baker, Blane

    2000-01-01

    Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)

  18. Parallel programming of industrial applications

    SciTech Connect

    Heroux, M; Koniges, A; Simon, H

    1998-07-21

    In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from these applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).

  19. Flexibility and Performance of Parallel File Systems

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1996-01-01

    As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.

  20. Oxytocin: parallel processing in the social brain?

    PubMed

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain.

  1. Parallel software tools at Langley Research Center

    NASA Technical Reports Server (NTRS)

    Moitra, Stuti; Tennille, Geoffrey M.; Lakeotes, Christopher D.; Randall, Donald P.; Arthur, Jarvis J.; Hammond, Dana P.; Mall, Gerald H.

    1993-01-01

    This document gives a brief overview of parallel software tools available on the Intel iPSC/860 parallel computer at Langley Research Center. It is intended to provide a source of information that is somewhat more concise than vendor-supplied material on the purpose and use of various tools. Each of the chapters on tools is organized in a similar manner covering an overview of the functionality, access information, how to effectively use the tool, observations about the tool and how it compares to similar software, known problems or shortfalls with the software, and reference documentation. It is primarily intended for users of the iPSC/860 at Langley Research Center and is appropriate for both the experienced and novice user.

  2. Address tracing for parallel machines

    NASA Technical Reports Server (NTRS)

    Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

    1991-01-01

    Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.

  3. Parallel Algorithms for Image Analysis.

    DTIC Science & Technology

    1982-06-01

    8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9

  4. Debugging in a parallel environment

    SciTech Connect

    Wasserman, H.J.; Griffin, J.H.

    1985-01-01

    This paper describes the preliminary results of a project investigating approaches to dynamic debugging in parallel processing systems. Debugging programs in a multiprocessing environment is particularly difficult because of potential errors in synchronization of tasks, data dependencies, sharing of data among tasks, and irreproducibility of specific machine instruction sequences from one job to the next. The basic methodology involved in predicate-based debuggers is given as well as other desirable features of dynamic parallel debugging. 13 refs.

  5. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  6. Efficiency of parallel direct optimization.

    PubMed

    Janies, D A; Wheeler, W C

    2001-03-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size.

  7. Modelling Layer parallel stylolites

    NASA Astrophysics Data System (ADS)

    Koehn, Daniel; Pataki Rood, Daisy; Beaudoin, Nicolas

    2016-04-01

    We modeled the geometrical roughening of mainly layer-dominated stylolites in order to understand their structural evolution, to present an advanced classification of stylolite shapes and to relate this classification to chemical compaction and stylolite sealing capabilities. Our simulations show that layer-dominated stylolites can grow in three distinct stages, an initial slow nucleation, a fast layer-pinning phase and a final freezing stage if the layer dissolves completely during growth. Dissolution of the pinning layer and thus destruction of the compaction tracking capabilities is a function of the background noise in the rock and the dissolution rate of the layer itself. Low background noise needs a slower dissolving layer for pinning to be successful but produces flatter teeth than higher background noise. We present an advanced classification based on our simulations and separate stylolites into four classes: rectangular layer type, seismogram pinning type, suture/sharp peak type and simple wave-like type.

  8. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-18

    Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  9. Memory Scalability and Efficiency Analysis of Parallel Codes

    SciTech Connect

    Janjusic, Tommy; Kartsaklis, Christos

    2015-01-01

    Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an application s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).

  10. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  11. Programming parallel architectures: The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  12. Parallelizing the track-target model for the MIMD machine

    SciTech Connect

    Zhong Xiong, W.; Swietlik, C.

    1992-09-01

    Military Tracking-Target systems are important analysis tools for modelling the major functions of a strategic defense system operating against a ballistic missile threat during a simulated end-to-end scenario. As demands grow for modelling more trajectories with increasing numbers of missile types, so have demands for more processing power. Argonne National Laboratory has developed the parallel version of this Tracking-Target model. The parallel version has exhibited speedups of up to a factor of 6.3 resulting from a shared memory multiprocessor machine. This paper documents a project to implement the Tracking-Target model on a parallel processing environment.

  13. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  14. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  15. Parallels in lignin biosynthesis

    PubMed Central

    Weng, Jing-Ke; Banks, Jo Ann

    2008-01-01

    A hallmark of vascular plants is the development of a complex water-conducting system, which is physically reinforced by the heterogeneous aromatic polymer lignin. Syringyl lignin, a major building block of lignin, is often thought to be uniquely characteristic of angiosperms; however, it was demonstrated over fifty years ago that that syringyl lignin is found in another group of plants, known as the lycophytes, the ancestors of which diverged from all the other vascular plant lineages 400 million years ago.1 To determine the biochemical basis for this common biosynthetic ability, we isolated and characterized cytochrome P450-dependent monooxygenases (P450s) from the lycophyte Selaginella moellendorffii and compared them to the enzyme that is required for syringyl lignin synthesis in angiosperms. Our results showed that one of these P450s encodes an enzyme that is functionally analogous to but phylogenetically independent from its angiosperm counterpart. Here, we discuss the evolution of lignin biosynthesis in vascular plants and the role of Selaginella moellendorffii in plant comparative biology and genomics. PMID:19704782

  16. The economics of parallel trade.

    PubMed

    Danzon, P M

    1998-03-01

    The potential for parallel trade in the European Union (EU) has grown with the accession of low price countries and the harmonisation of registration requirements. Parallel trade implies a conflict between the principle of autonomy of member states to set their own pharmaceutical prices, the principle of free trade and the industrial policy goal of promoting innovative research and development (R&D). Parallel trade in pharmaceuticals does not yield the normal efficiency gains from trade because countries achieve low pharmaceutical prices by aggressive regulation, not through superior efficiency. In fact, parallel trade reduces economic welfare by undermining price differentials between markets. Pharmaceutical R&D is a global joint cost of serving all consumers worldwide; it accounts for roughly 30% of total costs. Optimal (welfare maximising) pricing to cover joint costs (Ramsey pricing) requires setting different prices in different markets, based on inverse demand elasticities. By contrast, parallel trade and regulation based on international price comparisons tend to force price convergence across markets. In response, manufacturers attempt to set a uniform 'euro' price. The primary losers from 'euro' pricing will be consumers in low income countries who will face higher prices or loss of access to new drugs. In the long run, even higher income countries are likely to be worse off with uniform prices, because fewer drugs will be developed. One policy option to preserve price differentials is to exempt on-patent products from parallel trade. An alternative is confidential contracting between individual manufacturers and governments to provide country-specific ex post discounts from the single 'euro' wholesale price, similar to rebates used by managed care in the US. This would preserve differentials in transactions prices even if parallel trade forces convergence of wholesale prices.

  17. A class of trust-region methods for parallel optimization

    SciTech Connect

    P. D. Hough; J. C. Meza

    1999-03-01

    The authors present a new class of optimization methods that incorporates a Parallel Direct Search (PDS) method within a trust-region Newton framework. This approach combines the inherent parallelism of PDS with the rapid and robust convergence properties of Newton methods. Numerical tests have yielded favorable results for both standard test problems and engineering applications. In addition, the new method appears to be more robust in the presence of noisy functions that are inherent in many engineering simulations.

  18. Parallel Quantum Circuit in a Tunnel Junction

    NASA Astrophysics Data System (ADS)

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian; GNS theory Group Team

    In between 2 metallic nanopads, adding identical and independent electron transfer paths in parallel increases the electronic effective coupling between the 2 nanopads through the quantum circuit defined by those paths. Measuring this increase of effective coupling using the tunnelling current intensity can lead for example for 2 paths in parallel to the now standard G =G1 +G2 + 2√{G1 .G2 } conductance superposition law (1). This is only valid for the tunnelling regime (2). For large electronic coupling to the nanopads (or at resonance), G can saturate and even decay as a function of the number of parallel paths added in the quantum circuit (3). We provide here the explanation of this phenomenon: the measurement of the effective Rabi oscillation frequency using the current intensity is constrained by the normalization principle of quantum mechanics. This limits the quantum conductance G for example to go when there is only one channel per metallic nanopads. This ef fect has important consequences for the design of Boolean logic gates at the atomic scale using atomic scale or intramolecular circuits. References: This has the financial support by European PAMS project.

  19. Kalman Filter Tracking on Parallel Architectures

    NASA Astrophysics Data System (ADS)

    Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2015-12-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter [2]. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.

  20. Parallel Stitching of Two-Dimensional Materials

    NASA Astrophysics Data System (ADS)

    Ling, Xi; Lin, Yuxuan; Dresselhaus, Mildred; Palacios, Tomás; Kong, Jing; Department of Electrical Engineering; Computer Science, Massachusetts Institute of Technology Team

    Large scale integration of atomically thin metals (e.g. graphene), semiconductors (e.g. transition metal dichalcogenides (TMDs)), and insulators (e.g. hexagonal boron nitride) is critical for constructing the building blocks for future nanoelectronics and nanophotonics. However, the construction of in-plane heterostructures, especially between two atomic layers with large lattice mismatch, could be extremely difficult due to the strict requirement of spatial precision and the lack of a selective etching method. Here, we developed a general synthesis methodology to achieve both vertical and in-plane ``parallel stitched'' heterostructures between a two-dimensional (2D) and TMD materials, which enables both multifunctional electronic/optoelectronic devices and their large scale integration. This is achieved via selective ``sowing'' of aromatic molecule seeds during the chemical vapor deposition growth. MoS2 is used as a model system to form heterostructures with diverse other 2D materials. Direct and controllable synthesis of large-scale parallel stitched graphene-MoS2 heterostructures was further investigated. Unique nanometer overlapped junctions were obtained at the parallel stitched interface, which are highly desirable both as metal-semiconductor contact and functional devices/systems, such as for use in logical integrated circuits (ICs) and broadband photodetectors.

  1. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  2. Quality of life in patients with functional dyspepsia: Short- and long-term effect of Helicobacter pylori eradication with pantoprazole, amoxicillin, and clarithromycin or cisapride therapy: A prospective, parallel-group study

    PubMed Central

    Buzás, György M.

    2006-01-01

    Background: Quality of life (QOL) is impaired in functional dyspepsia (FD). Little is known about the effects of different therapies on the QOL profile in patients with this condition. Objectives: The aims of this study were to measure baseline QOL in patients with FD and to assess changes in QOL over time associated with Helicobacter pylori eradication and prokinetic treatment. The primary and secondary end points were the improvement in QOL 6 weeks and 1 year after successful eradication of the infection or prokinetic therapy. Methods: This 1-year, single-center, prospective, open-label, controlled, parallel-group trial was conducted at the Department of Gastroenterology, Ferencvdros Health Centre, Budapest, Hungary. The Functional Digestive Disorder Quality of Life (FDDQoL) Questionnaire (MAPI Research Institute, Lyon, France) was translated and validated previously in Hungarian. Male and female subjects aged 20 to 60 years were enrolled and classified as H pylori positive (HP+), H pylori negative (HP-) with FD, or healthy (control group). The HP+ patients received pantoprazole 40 mg BID + amoxicillin 1000 mg BID + clarithromycin 500 mg BID for 7 days, followed by on-demand ranitidine (150–300 mg/d) for 1 year. The HP- patients received the prokinetic cisapride 10 mg TID for 1 month, followed by on-demand cisapride (10–20 mg/d) for 1 year. The FDDQoL questionnaire was completed by all 3 groups on enrollment, at 6 weeks, and 1 year. Results: A total of 101 HP+ patients, 98 HP- patients, and 123 healthy controls were included in the study (185 women, 137 men; mean age, 39.0 ears). The mean (SD) baseline QOL scores were significantly lower in the HP+ group (53.3 [9.6]; 95% CI, 54.4-58.2) and the HP- groups (50.0 [9.8]; 95% CI, 58.0–62.0) compared with that in healthy controls (76.2 [8.7]; 95% CI, 74.6–77.8) (both, P < 0.001). Analysis of the short-term domain scores found that the HP+ group had significantly decreased scores in 6 of 8 domains: daily

  3. FEREBUS: Highly parallelized engine for kriging training.

    PubMed

    Di Pasquale, Nicodemo; Bane, Michael; Davie, Stuart J; Popelier, Paul L A

    2016-11-05

    FFLUX is a novel force field based on quantum topological atoms, combining multipolar electrostatics with IQA intraatomic and interatomic energy terms. The program FEREBUS calculates the hyperparameters of models produced by the machine learning method kriging. Calculation of kriging hyperparameters (θ and p) requires the optimization of the concentrated log-likelihood L̂(θ,p). FEREBUS uses Particle Swarm Optimization (PSO) and Differential Evolution (DE) algorithms to find the maximum of L̂(θ,p). PSO and DE are two heuristic algorithms that each use a set of particles or vectors to explore the space in which L̂(θ,p) is defined, searching for the maximum. The log-likelihood is a computationally expensive function, which needs to be calculated several times during each optimization iteration. The cost scales quickly with the problem dimension and speed becomes critical in model generation. We present the strategy used to parallelize FEREBUS, and the optimization of L̂(θ,p) through PSO and DE. The code is parallelized in two ways. MPI parallelization distributes the particles or vectors among the different processes, whereas the OpenMP implementation takes care of the calculation of L̂(θ,p), which involves the calculation and inversion of a particular matrix, whose size increases quickly with the dimension of the problem. The run time shows a speed-up of 61 times going from single core to 90 cores with a saving, in one case, of ∼98% of the single core time. In fact, the parallelization scheme presented reduces computational time from 2871 s for a single core calculation, to 41 s for 90 cores calculation. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.

  4. Visualizing Parallel Computer System Performance

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.

    1988-01-01

    Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

  5. Features in Continuous Parallel Coordinates.

    PubMed

    Lehmann, Dirk J; Theisel, Holger

    2011-12-01

    Continuous Parallel Coordinates (CPC) are a contemporary visualization technique in order to combine several scalar fields, given over a common domain. They facilitate a continuous view for parallel coordinates by considering a smooth scalar field instead of a finite number of straight lines. We show that there are feature curves in CPC which appear to be the dominant structures of a CPC. We present methods to extract and classify them and demonstrate their usefulness to enhance the visualization of CPCs. In particular, we show that these feature curves are related to discontinuities in Continuous Scatterplots (CSP). We show this by exploiting a curve-curve duality between parallel and Cartesian coordinates, which is a generalization of the well-known point-line duality. Furthermore, we illustrate the theoretical considerations. Concluding, we discuss relations and aspects of the CPC's/CSP's features concerning the data analysis.

  6. PARAVT: Parallel Voronoi tessellation code

    NASA Astrophysics Data System (ADS)

    González, R. E.

    2016-10-01

    In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.

  7. Parallel integrated frame synchronizer chip

    NASA Technical Reports Server (NTRS)

    Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)

    2000-01-01

    A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.

  8. Fast data parallel polygon rendering

    SciTech Connect

    Ortega, F.A.; Hansen, C.D.

    1993-09-01

    This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.

  9. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  10. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  11. Fast parallel algorithm for CT image reconstruction.

    PubMed

    Flores, Liubov A; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo

    2012-01-01

    In X-ray computed tomography (CT) the X rays are used to obtain the projection data needed to generate an image of the inside of an object. The image can be generated with different techniques. Iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions and from a small number of projections. Their use may be important in portable scanners for their functionality in emergency situations. However, in practice, these methods are not widely used due to the high computational cost of their implementation. In this work we analyze iterative parallel image reconstruction with the Portable Extensive Toolkit for Scientific computation (PETSc).

  12. Hybrid parallel programming with MPI and Unified Parallel C.

    SciTech Connect

    Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.

    2010-01-01

    The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.

  13. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  14. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  15. Medipix2 parallel readout system

    NASA Astrophysics Data System (ADS)

    Fanti, V.; Marzeddu, R.; Randaccio, P.

    2003-08-01

    A fast parallel readout system based on a PCI board has been developed in the framework of the Medipix collaboration. The readout electronics consists of two boards: the motherboard directly interfacing the Medipix2 chip, and the PCI board with digital I/O ports 32 bits wide. The device driver and readout software have been developed at low level in Assembler to allow fast data transfer and image reconstruction. The parallel readout permits a transfer rate up to 64 Mbytes/s. http://medipix.web.cern ch/MEDIPIX/

  16. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-03-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

  17. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-12-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.

  18. The Complexity of Parallel Algorithms,

    DTIC Science & Technology

    1985-11-01

    Much of this work was done in collaboration with my advisor, Ernst Mayr . He was also supported in part by ONR contract N00014-85-C-0731. F ’. Table...Helinbold and Mayr in their algorithn to compute an optimal two processor schedule [HM2]. One of the promising developments in parallel algorithms is that...lei can be solved by it fast parallel algorithmmmi if the nmlmmmibers are smiall. llehmibold and Mayr JIlM I] have slhowm that. if Ole job timies are

  19. File concepts for parallel I/O

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1989-01-01

    The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

  20. Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

    PubMed

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-06-01

    We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.

  1. Parallel, Distributed Scripting with Python

    SciTech Connect

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.

  2. Fast, Massively Parallel Data Processors

    NASA Technical Reports Server (NTRS)

    Heaton, Robert A.; Blevins, Donald W.; Davis, ED

    1994-01-01

    Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.

  3. Optical Interferometric Parallel Data Processor

    NASA Technical Reports Server (NTRS)

    Breckinridge, J. B.

    1987-01-01

    Image data processed faster than in present electronic systems. Optical parallel-processing system effectively calculates two-dimensional Fourier transforms in time required by light to travel from plane 1 to plane 8. Coherence interferometer at plane 4 splits light into parts that form double image at plane 6 if projection screen placed there.

  4. Tutorial: Parallel Simulation on Supercomputers

    SciTech Connect

    Perumalla, Kalyan S

    2012-01-01

    This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.

  5. The physics of parallel machines

    NASA Technical Reports Server (NTRS)

    Chan, Tony F.

    1988-01-01

    The idea is considered that architectures for massively parallel computers must be designed to go beyond supporting a particular class of algorithms to supporting the underlying physical processes being modelled. Physical processes modelled by partial differential equations (PDEs) are discussed. Also discussed is the idea that an efficient architecture must go beyond nearest neighbor mesh interconnections and support global and hierarchical communications.

  6. Parallel distributed computing using Python

    NASA Astrophysics Data System (ADS)

    Dalcin, Lisandro D.; Paz, Rodrigo R.; Kler, Pablo A.; Cosimo, Alejandro

    2011-09-01

    This work presents two software components aimed to relieve the costs of accessing high-performance parallel computing resources within a Python programming environment: MPI for Python and PETSc for Python. MPI for Python is a general-purpose Python package that provides bindings for the Message Passing Interface (MPI) standard using any back-end MPI implementation. Its facilities allow parallel Python programs to easily exploit multiple processors using the message passing paradigm. PETSc for Python provides access to the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Its facilities allow sequential and parallel Python applications to exploit state of the art algorithms and data structures readily available in PETSc for the solution of large-scale problems in science and engineering. MPI for Python and PETSc for Python are fully integrated to PETSc-FEM, an MPI and PETSc based parallel, multiphysics, finite elements code developed at CIMEC laboratory. This software infrastructure supports research activities related to simulation of fluid flows with applications ranging from the design of microfluidic devices for biochemical analysis to modeling of large-scale stream/aquifer interactions.

  7. Parallelizing AT with MatlabMPI

    SciTech Connect

    Li, Evan Y.; /Brown U. /SLAC

    2011-06-22

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  8. Parallel global optimization with the particle swarm algorithm.

    PubMed

    Schutte, J F; Reinbolt, J A; Fregly, B J; Haftka, R T; George, A D

    2004-12-07

    Present day engineering optimization problems often impose large computational demands, resulting in long solution times even on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained parallelization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima-large-scale analytical test problems with computationally cheap function evaluations and medium-scale biomechanical system identification problems with computationally expensive function evaluations. For load-balanced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In contrast, for load-imbalanced biomechanical system identification problems with 12 design variables, speedup plateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the synchronization requirement of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical problems were solved using a fixed number of swarm iterations, a single population of 128 particles produced a better convergence rate than did multiple independent runs performed using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 particles). These results suggest that (1) parallel PSO exhibits excellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-life problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available.

  9. Broadband monitoring simulation with massively parallel processors

    NASA Astrophysics Data System (ADS)

    Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

    2011-09-01

    Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

  10. Predicting mining activity with parallel genetic algorithms

    USGS Publications Warehouse

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  11. Supporting data intensive applications with medium grained parallelism

    SciTech Connect

    Pfaltz, J.L.; French, J.C.; Grimshaw, A.S.; Son, S.H.

    1992-04-01

    ADAMS is an ambitious effort to provide new database access paradigms for the kinds of scientific applications that require massively parallel access to very large data sets in order to be effective. Many of the Grand Challenge Problems fall into this category, as well as those kinds of scientific research which depend on widely distributed shared sets of disparate data. The essence of the ADAMS approach is to view data purely in functional terms, rather than the more traditional structural view in which multiple data items are aggregated into records or tuples of flat files. Further, ADAMS has been implemented as an embedded interface so that scientists can develop applications in the host programming language of their choice, often Fortran, Pascal, or C, and still access shared data generated in other environments. The syntax and semantics of ADAMS is essentially complete. The functional nature of the ADAMS data interface paradigm simplifies its implementation in a distributed environment, e.g., the Mentat run-time system, because one must only distribute functional servers, not pieces of data structures. However, this only opens up the possibility of effective parallel database processing; to realize this potential far more work must be done in the areas of data dependence, intra-statement parallelism, parallel query optimization, and maintaining consistency and reliability in concurrent systems. Discovering how to make effective parallel data access an actually in real scientific applications is the point of this research.

  12. Task parallelism and high-performance languages

    SciTech Connect

    Foster, I.

    1996-03-01

    The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.

  13. A generalized parallel replica dynamics

    NASA Astrophysics Data System (ADS)

    Binder, Andrew; Lelièvre, Tony; Simpson, Gideon

    2015-03-01

    Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming-Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.

  14. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  15. Parallel supercomputing with commodity components

    SciTech Connect

    Warren, M.S.; Goda, M.P.; Becker, D.J.

    1997-09-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  16. ASP: a parallel computing technology

    NASA Astrophysics Data System (ADS)

    Lea, R. M.

    1990-09-01

    ASP modules constitute the basis of a parallel computing technology platform for the rapid development of a broad range of numeric and symbolic information processing systems. Based on off-the-shelf general-purpose hardware and software modules ASP technology is intended to increase productivity in the development (and competitiveness in the marketing) of cost-effective low-MIMD/high-SIMD Massively Parallel Processor (MPPs). The paper discusses ASP module philosophy and demonstrates how ASP modules can satisfy the market algorithmic architectural and engineering requirements of such MPPs. In particular two specific ASP modules based on VLSI and WSI technologies are studied as case examples of ASP technology the latter reporting 1 TOPS/fl3 1 GOPS/W and 1 MOPS/$ as ball-park figures-of-merit of cost-effectiveness.

  17. Parallel processing spacecraft communication system

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

    1998-01-01

    An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

  18. A generalized parallel replica dynamics

    SciTech Connect

    Binder, Andrew; Lelièvre, Tony; Simpson, Gideon

    2015-03-01

    Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming–Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.

  19. Parallel supercomputing with commodity components

    NASA Technical Reports Server (NTRS)

    Warren, M. S.; Goda, M. P.; Becker, D. J.

    1997-01-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  20. Parallel multiplex laser feedback interferometry

    SciTech Connect

    Zhang, Song; Tan, Yidong; Zhang, Shulian

    2013-12-15

    We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.

  1. Parallelism in Manipulator Dynamics. Revision.

    DTIC Science & Technology

    1983-12-01

    excessive, and a VLSI implementation architecutre is suggested. We indicate possible appli- cations to incorporating dynamical considerations into...Inverse Dynamics problem. It investigates the high degree of parallelism inherent in the computations , and presents two "mathematically exact" formulations...and a 3 b Cases ............. ... 109 5 .9-- i 0. OVERVIEW The Inverse Dynamics problem consists (loosely) of computing the motor torques necessary to

  2. Parallel Symmetric Eigenvalue Problem Solvers

    DTIC Science & Technology

    2015-05-01

    graduate school. Debugging somebody else’s MPI code is an immensely frustrating experience, but he would regularly stay late at the oce to assist me...cessfully. In addition, I will describe the parallel kernels required by my code . 5 The next sections will describe my Fortran-based implementations of...Sandia’s publicly available Trace- Min code . Each of the methods has its own unique advantages and disadvantages, summarized in table 3.1. In short, I

  3. Parallel Algorithms for Computer Vision.

    DTIC Science & Technology

    1987-01-01

    73 755 P fiu.LEL ALORITHMS FOR CO PUTER VISIO (U) /MASSACHUSETTS INST OF TECH CRMORIDGE T P00010 ET AL.JAN 8? ETL-0456 DACA7-05-C-8IIO m 7E F/0 1...regularization principles, such as edge detection, stereo , motion, surface interpolation and shape from shading. The basic members of class I are convolution...them in collabo- ration with Thinking Machines Corporation): * Parallel convolution * Zero-crossing detection * Stereo -matching * Surface reconstruction

  4. Lightweight Specifications for Parallel Correctness

    DTIC Science & Technology

    2012-12-05

    this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204... George Necula Professor David Wessel Fall 2012 1 Abstract Lightweight Specifications for Parallel Correctness by Jacob Samuels Burnim Doctor of Philosophy...enthusiasm and endless flow of ideas, and for his keen research sense. I would also like to thank George Necula for chairing my qualifying exam committee and

  5. National Combustion Code: Parallel Performance

    NASA Technical Reports Server (NTRS)

    Babrauckas, Theresa

    2001-01-01

    This report discusses the National Combustion Code (NCC). The NCC is an integrated system of codes for the design and analysis of combustion systems. The advanced features of the NCC meet designers' requirements for model accuracy and turn-around time. The fundamental features at the inception of the NCC were parallel processing and unstructured mesh. The design and performance of the NCC are discussed.

  6. Parallel Algorithms for Computer Vision.

    DTIC Science & Technology

    1989-01-01

    demonstrated the Vision Machine system processing images and recognizing objects through the inte- gration of several visual cues. The first version of the...achievements. n 2.1 The Vision Machine The overall organization of tie Vision Machine systeliis ased. o parallel processing of tie images by independent...smoothed and made dense by exploiting known constraints within each process (for example., that disparity is smooth). This is the stage of approximation

  7. Parallel strategies for SAR processing

    NASA Astrophysics Data System (ADS)

    Segoviano, Jesus A.

    2004-12-01

    This article proposes a series of strategies for improving the computer process of the Synthetic Aperture Radar (SAR) signal treatment, following the three usual lines of action to speed up the execution of any computer program. On the one hand, it is studied the optimization of both, the data structures and the application architecture used on it. On the other hand it is considered a hardware improvement. For the former, they are studied both, the usually employed SAR process data structures, proposing the use of parallel ones and the way the parallelization of the algorithms employed on the process is implemented. Besides, the parallel application architecture classifies processes between fine/coarse grain. These are assigned to individual processors or separated in a division among processors, all of them in their corresponding architectures. For the latter, it is studied the hardware employed on the computer parallel process used in the SAR handling. The improvement here refers to several kinds of platforms in which the SAR process is implemented, shared memory multicomputers, and distributed memory multiprocessors. A comparison between them gives us some guidelines to follow in order to get a maximum throughput with a minimum latency and a maximum effectiveness with a minimum cost, all together with a limited complexness. It is concluded and described, that the approach consisting of the processing of the algorithms in a GNU/Linux environment, together with a Beowulf cluster platform offers, under certain conditions, the best compromise between performance and cost, and promises the major development in the future for the Synthetic Aperture Radar computer power thirsty applications in the next years.

  8. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  9. Parallel processing of genomics data

    NASA Astrophysics Data System (ADS)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  10. Highly parallel sparse Cholesky factorization

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

  11. Parallel Markov chain Monte Carlo simulations.

    PubMed

    Ren, Ruichao; Orkoulas, G

    2007-06-07

    With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.

  12. Parallel Markov chain Monte Carlo simulations

    NASA Astrophysics Data System (ADS)

    Ren, Ruichao; Orkoulas, G.

    2007-06-01

    With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.

  13. Allinea DDT as a Parallel Debugging Alternative to Totalview

    SciTech Connect

    Antypas, K.B.

    2007-03-05

    Totalview, from the Etnus Corporation, is a sophisticated and feature rich software debugger for parallel applications. As Totalview has gained in popularity and market share its pricing model has increased to the point where it is often prohibitively expensive for massively parallel supercomputers. Additionally, many of Totalview's advanced features are not used by members of the scientific computing community. For these reasons, supercomputing centers have begun to search for a basic parallel debugging tool which can be used as an alternative to Totalview. As the cost and complexity of Totalview has increased over the years, scientific computing centers have started searching for a viable parallel debugging alternative. DDT (Distributed Debugging Tool) from Allinea Software is a relatively new parallel debugging tool which aims to provide much of the same functionality as Totalview. This review outlines the basic features and limitations of DDT to determine if it can be a reasonable substitute for Totalview. DDT was tested on the NERSC platforms Bassi, Seaborg, Jacquard and Davinci with Fortran90, C, and C++ codes using MPI and OpenMP for parallelism.

  14. The probability of parallel genetic evolution from standing genetic variation.

    PubMed

    MacPherson, A; Nuismer, S L

    2017-02-01

    Parallel evolution is often assumed to result from repeated adaptation to novel, yet ecologically similar, environments. Here, we develop and analyse a mathematical model that predicts the probability of parallel genetic evolution from standing genetic variation as a function of the strength of phenotypic selection and constraints imposed by genetic architecture. Our results show that the probability of parallel genetic evolution increases with the strength of natural selection and effective population size and is particularly likely to occur for genes with large phenotypic effects. Building on these results, we develop a Bayesian framework for estimating the strength of parallel phenotypic selection from genetic data. Using extensive individual-based simulations, we show that our estimator is robust across a wide range of genetic and evolutionary scenarios and provides a useful tool for rigorously testing the hypothesis that parallel genetic evolution is the result of adaptive evolution. An important result that emerges from our analyses is that existing studies of parallel genetic evolution frequently rely on data that is insufficient for distinguishing between adaptive evolution and neutral evolution driven by random genetic drift. Overcoming this challenge will require sampling more populations and the inclusion of larger numbers of loci.

  15. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  16. Optical flow optimization using parallel genetic algorithm

    NASA Astrophysics Data System (ADS)

    Zavala-Romero, Olmo; Botella, Guillermo; Meyer-Bäse, Anke; Meyer Base, Uwe

    2011-06-01

    A new approach to optimize the parameters of a gradient-based optical flow model using a parallel genetic algorithm (GA) is proposed. The main characteristics of the optical flow algorithm are its bio-inspiration and robustness against contrast, static patterns and noise, besides working consistently with several optical illusions where other algorithms fail. This model depends on many parameters which conform the number of channels, the orientations required, the length and shape of the kernel functions used in the convolution stage, among many more. The GA is used to find a set of parameters which improve the accuracy of the optical flow on inputs where the ground-truth data is available. This set of parameters helps to understand which of them are better suited for each type of inputs and can be used to estimate the parameters of the optical flow algorithm when used with videos that share similar characteristics. The proposed implementation takes into account the embarrassingly parallel nature of the GA and uses the OpenMP Application Programming Interface (API) to speedup the process of estimating an optimal set of parameters. The information obtained in this work can be used to dynamically reconfigure systems, with potential applications in robotics, medical imaging and tracking.

  17. Development of a Parallel Redundant STATCOM System

    NASA Astrophysics Data System (ADS)

    Takeda, Masatoshi; Yasuda, Satoshi; Tamai, Shinzo; Morishima, Naoki

    This paper presents a new concept of parallel redundant STATCOM system. This system consists of a number of medium capacity STATCOM units connected in parallel, which can achieve a high operational reliability and functional flexibility. The proposed STATCOM system has such redundant operation characteristics that the remaining STATCOM units can maintain their operation even though some of the STATCOM units are out of service. And also, it has flexible convertibility so that it can be converted to a BTB or a UPFC system easily, according to the diversified change of needs in power systems. In order to realize this concept, the authors developed several important key technologies for the STATCOM, such as the novel PWM scheme that enables effective cancellation of lower order harmonics, GCT inverter technologies with small loss consumption, and the coordination control scheme with capacitor banks to ensure effective dynamic performance with minimum loss consumption. The proposed STATCOM system was put into practical applications, exhibiting excellent performance characteristics at each site.

  18. The interaction of turbulence with parallel and perpendicular shocks

    NASA Astrophysics Data System (ADS)

    Adhikari, L.; Zank, G. P.; Hunana, P.; Hu, Q.

    2016-11-01

    Interplanetary shocks exist in most astrophysical flows, and modify the properties of the background flow. We apply the Zank et al 2012 six coupled turbulence transport model equations to study the interaction of turbulence with parallel and perpendicular shock waves in the solar wind. We model the 1D structure of a stationary perpendicular or parallel shock wave using a hyperbolic tangent function and the Rankine-Hugoniot conditions. A reduced turbulence transport model (the 4-equation model) is applied to parallel and perpendicular shock waves, and solved using a 4th- order Runge Kutta method. We compare the model results with ACE spacecraft observations. We identify one quasi-parallel and one quasi-perpendicular event in the ACE spacecraft data sets, and compute various turbulent observed values such as the fluctuating magnetic and kinetic energy, the energy in forward and backward propagating modes, the total turbulent energy in the upstream and downstream of the shock. We also calculate the error associated with each turbulent observed value, and fit the observed values by a least square method and use a Fourier series fitting function. We find that the theoretical results are in reasonable agreement with observations. The energy in turbulent fluctuations is enhanced and the correlation length is approximately constant at the shock. Similarly, the normalized cross helicity increases across a perpendicular shock, and decreases across a parallel shock.

  19. Parallelizing alternating direction implicit solver on GPUs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...

  20. Implementing clips on a parallel computer

    NASA Technical Reports Server (NTRS)

    Riley, Gary

    1987-01-01

    The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.

  1. Parallel molecular dynamics: Communication requirements for massively parallel machines

    NASA Astrophysics Data System (ADS)

    Taylor, Valerie E.; Stevens, Rick L.; Arnold, Kathryn E.

    1995-05-01

    Molecular mechanics and dynamics are becoming widely used to perform simulations of molecular systems from large-scale computations of materials to the design and modeling of drug compounds. In this paper we address two major issues: a good decomposition method that can take advantage of future massively parallel processing systems for modest-sized problems in the range of 50,000 atoms and the communication requirements needed to achieve 30 to 40% efficiency on MPPs. We analyzed a scalable benchmark molecular dynamics program executing on the Intel Touchstone Deleta parallelized with an interaction decomposition method. Using a validated analytical performance model of the code, we determined that for an MPP with a four-dimensional mesh topology and 400 MHz processors the communication startup time must be at most 30 clock cycles and the network bandwidth must be at least 2.3 GB/s. This configuration results in 30 to 40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors.

  2. Detection of multiple sinusoids using a parallel ale

    SciTech Connect

    David, R.A.

    1984-01-01

    This paper introduces an Adaptive Line Enhancer (ALE) whose parallel structure enables the detection and enhancement of multiple sinusoids. A function describing the performance surface is derived for the case where several line signals are buried in white noise. A steepest descent adaptive algorithm is derived, and simulations are used to demonstrate its performance.

  3. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  4. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  5. Debugging Parallel Programs with Instant Replay.

    DTIC Science & Technology

    1986-09-01

    produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel programs, termed Instant Replay...Instant Replay on the BBN Butterfly Parallel Processor, and discuss how it can be incorporated into the debugging cycle for parallel programs. This...program often do not produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel

  6. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  7. High Performance Parallel Computational Nanotechnology

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to

  8. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  9. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    ERIC Educational Resources Information Center

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  10. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  11. Coordination in serial-parallel image processing

    NASA Astrophysics Data System (ADS)

    Wójcik, Waldemar; Dubovoi, Vladymyr M.; Duda, Marina E.; Romaniuk, Ryszard S.; Yesmakhanova, Laura; Kozbakova, Ainur

    2015-12-01

    Serial-parallel systems used to convert the image. The control of their work results with the need to solve coordination problem. The paper summarizes the model of coordination of resource allocation in relation to the task of synchronizing parallel processes; the genetic algorithm of coordination developed, its adequacy verified in relation to the process of parallel image processing.

  12. Scalable Parallel Algebraic Multigrid Solvers

    SciTech Connect

    Bank, R; Lu, S; Tong, C; Vassilevski, P

    2005-03-23

    The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.

  13. Parallel Assembly of LIGA Components

    SciTech Connect

    Christenson, T.R.; Feddema, J.T.

    1999-03-04

    In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.

  14. True Shear Parallel Plate Viscometer

    NASA Technical Reports Server (NTRS)

    Ethridge, Edwin; Kaukler, William

    2010-01-01

    This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.

  15. Scheduling Tasks In Parallel Processing

    NASA Technical Reports Server (NTRS)

    Price, Camille C.; Salama, Moktar A.

    1989-01-01

    Algorithms sought to minimize time and cost of computation. Report describes research on scheduling of computations tasks in system of multiple identical data processors operating in parallel. Computational intractability requires use of suboptimal heuristic algorithms. First algorithm called "list heuristic", variation of classical list scheduling. Second algorithm called "cluster heuristic" applied to tightly coupled tasks and consists of four phases. Third algorithm called "exchange heuristic", iterative-improvement algorithm beginning with initial feasible assignment of tasks to processors and periods of time. Fourth algorithm is iterative one for optimal assignment of tasks and based on concept called "simulated annealing" because of mathematical resemblance to aspects of physical annealing processes.

  16. Heart Fibrillation and Parallel Supercomputers

    NASA Technical Reports Server (NTRS)

    Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.

    1997-01-01

    The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.

  17. Development of massively parallel quantum chemistry program SMASH

    SciTech Connect

    Ishimura, Kazuya

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  18. Local and nonlocal parallel heat transport in general magnetic fields

    SciTech Connect

    Del-Castillo-Negrete, Diego B; Chacon, Luis

    2011-01-01

    A novel approach for the study of parallel transport in magnetized plasmas is presented. The method avoids numerical pollution issues of grid-based formulations and applies to integrable and chaotic magnetic fields with local or nonlocal parallel closures. In weakly chaotic fields, the method gives the fractal structure of the devil's staircase radial temperature profile. In fully chaotic fields, the temperature exhibits self-similar spatiotemporal evolution with a stretched-exponential scaling function for local closures and an algebraically decaying one for nonlocal closures. It is shown that, for both closures, the effective radial heat transport is incompatible with the quasilinear diffusion model.

  19. Parallel Monte Carlo Simulation for control system design

    NASA Technical Reports Server (NTRS)

    Schubert, Wolfgang M.

    1995-01-01

    The research during the 1993/94 academic year addressed the design of parallel algorithms for stochastic robustness synthesis (SRS). SRS uses Monte Carlo simulation to compute probabilities of system instability and other design-metric violations. The probabilities form a cost function which is used by a genetic algorithm (GA). The GA searches for the stochastic optimal controller. The existing sequential algorithm was analyzed and modified to execute in a distributed environment. For this, parallel approaches to Monte Carlo simulation and genetic algorithms were investigated. Initial empirical results are available for the KSR1.

  20. Analysis of the Rotopod: An all revolute parallel manipulator

    SciTech Connect

    Schmitt, D.J.; Benavides, G.L.; Bieg, L.F.; Kozlowski, D.M.

    1998-05-16

    This paper introduces a new configuration of parallel manipulator call the Rotopod which is constructed from all revolute type joints. The Rotopod consists of two platforms connected by six legs and exhibits six Cartesian degrees of freedom. The Rotopod is initially compared with other all revolute joint parallel manipulators to show its similarities and differences. The inverse kinematics for this mechanism are developed and used to analyze the accessible workspace of the mechanism. Optimization is performed to determine the Rotopod design configurations which maximum the accessible workspace based on desirable functional constraints.

  1. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  2. Parallel job-scheduling algorithms

    SciTech Connect

    Rodger, S.H.

    1989-01-01

    In this thesis, we consider solving job scheduling problems on the CREW PRAM model. We show how to adapt Cole's pipeline merge technique to yield several efficient parallel algorithms for a number of job scheduling problems and one optimal parallel algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and processing times, find a schedule that minimizes the maximum lateness of the jobs and allows preemption when the jobs are scheduled to run on one machine. In addition, we present the first NC algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and unit processing times, determine if there is a schedule of jobs on one machine, and calculate the schedule if it exists. We identify the notion of a canonical schedule, which is the type of schedule our algorithm computes if there is a schedule. Our algorithm runs in O((log n){sup 2}) time and uses O(n{sup 2}k{sup 2}) processors, where k is the minimum number of distinct offsets of release times or deadlines.

  3. Partitioning in parallel processing of production systems

    SciTech Connect

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpreter with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.

  4. New Parallel computing framework for radiation transport codes

    SciTech Connect

    Kostin, M.A.; Mokhov, N.V.; Niita, K.; /JAERI, Tokai

    2010-09-01

    A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  5. Parallel iterative solvers and preconditioners using approximate hierarchical methods

    SciTech Connect

    Grama, A.; Kumar, V.; Sameh, A.

    1996-12-31

    In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.

  6. Parallelization strategy for large-scale vibronic coupling calculations.

    PubMed

    Rabidoux, Scott M; Eijkhout, Victor; Stanton, John F

    2014-12-26

    The vibronic coupling model of Köppel, Domcke, and Cederbaum is a powerful means to understand, predict, and analyze electronic spectra of molecules, especially those that exhibit phenomena that involve breakdown of the Born-Oppenheimer approximation. In this work, we describe a new parallel algorithm for carrying out such calculations. The algorithm is conceptually founded upon a "stencil" representation of the required computational steps, which motivates an efficient strategy for coarse-grained parallelization. The equations involved in the direct-CI type diagonalization of the model Hamiltonian are presented, the parallelization strategy is discussed in detail, and the method is illustrated by calculations involving direct-product basis sets with as many as 17 vibrational modes and 130 billion basis functions.

  7. Adaptive, multiresolution visualization of large data sets using parallel octrees.

    SciTech Connect

    Freitag, L. A.; Loy, R. M.

    1999-06-10

    The interactive visualization and exploration of large scientific data sets is a challenging and difficult task; their size often far exceeds the performance and memory capacity of even the most powerful graphics work-stations. To address this problem, we have created a technique that combines hierarchical data reduction methods with parallel computing to allow interactive exploration of large data sets while retaining full-resolution capability. The hierarchical representation is built in parallel by strategically inserting field data into an octree data structure. We provide functionality that allows the user to interactively adapt the resolution of the reduced data sets so that resolution is increased in regions of interest without sacrificing local graphics performance. We describe the creation of the reduced data sets using a parallel octree, the software architecture of the system, and the performance of this system on the data from a Rayleigh-Taylor instability simulation.

  8. Parallel algorithm of VLBI software correlator under multiprocessor environment

    NASA Astrophysics Data System (ADS)

    Zheng, Weimin; Zhang, Dong

    2007-11-01

    The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.

  9. Parallel job scheduling policies to improve fairness : a case study.

    SciTech Connect

    Leung, Vitus Joseph; Sabin, Gerald; Sadayappan, Ponnuswamy

    2008-02-01

    Balancing fairness, user performance, and system performance is a critical concern when developing and installing parallel schedulers. Sandia uses a customized scheduler to manage many of their parallel machines. A primary function of the scheduler is to ensure that the machines have good utilization and that users are treated in a 'fair' manner. A separate compute process allocator (CPA) ensures that the jobs on the machines are not too fragmented in order to maximize throughput. Until recently, there has been no established technique to measure the fairness of parallel job schedulers. This paper introduces a 'hybrid' fairness metric that is similar to recently proposed metrics. The metric uses the Sandia version of a 'fairshare' queuing priority as the basis for fairness. The hybrid fairness metric is used to evaluate a Sandia workload. Using these results, multiple scheduling strategies are introduced to improve performance while satisfying user and system performance constraints.

  10. QCMPI: A parallel environment for quantum computing

    NASA Astrophysics Data System (ADS)

    Tabakin, Frank; Juliá-Díaz, Bruno

    2009-06-01

    QCMPI is a quantum computer (QC) simulation package written in Fortran 90 with parallel processing capabilities. It is an accessible research tool that permits rapid evaluation of quantum algorithms for a large number of qubits and for various "noise" scenarios. The prime motivation for developing QCMPI is to facilitate numerical examination of not only how QC algorithms work, but also to include noise, decoherence, and attenuation effects and to evaluate the efficacy of error correction schemes. The present work builds on an earlier Mathematica code QDENSITY, which is mainly a pedagogic tool. In that earlier work, although the density matrix formulation was featured, the description using state vectors was also provided. In QCMPI, the stress is on state vectors, in order to employ a large number of qubits. The parallel processing feature is implemented by using the Message-Passing Interface (MPI) protocol. A description of how to spread the wave function components over many processors is provided, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors. These operators include the standard Pauli, Hadamard, CNOT and CPHASE gates and also Quantum Fourier transformation. These operators make up the actions needed in QC. Codes for Grover's search and Shor's factoring algorithms are provided as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to alternate noise effects, which corresponds to the idea of solving a stochastic Schrödinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its eigenvalues and associated entropy. Potential applications of this powerful tool include studies of the stability and correction of QC processes using Hamiltonian based dynamics. Program summaryProgram title: QCMPI Catalogue identifier: AECS_v1_0 Program summary URL

  11. A parallel coupled oceanic-atmospheric general circulation model

    SciTech Connect

    Wehner, M.F.; Bourgeois, A.J.; Eltgroth, P.G.; Duffy, P.B.; Dannevik, W.P.

    1994-12-01

    The Climate Systems Modeling group at LLNL has developed a portable coupled oceanic-atmospheric general circulation model suitable for use on a variety of massively parallel (MPP) computers of the multiple instruction, multiple data (MIMD) class. The model is composed of parallel versions of the UCLA atmospheric general circulation model, the GFDL modular ocean model (MOM) and a dynamic sea ice model based on the Hiber formulation extracted from the OPYC ocean model. The strategy to achieve parallelism is twofold. One level of parallelism is accomplished by applying two dimensional domain decomposition techniques to each of the three constituent submodels. A second level of parallelism is attained by a concurrent execution of AGCM and OGCM/sea ice components on separate sets of processors. For this functional decomposition scheme, a flux coupling module has been written to calculate the heat, moisture and momentum fluxes independent of either the AGCM or the OGCM modules. The flux coupler`s other roles are to facilitate the transfer of data between subsystem components and processors via message passing techniques and to interpolate and aggregate between the possibly incommensurate meshes.

  12. A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

    DOE PAGES

    Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...

    2013-01-01

    Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order inmore » a 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less

  13. Massively parallel sequencing: the next big thing in genetic medicine.

    PubMed

    Tucker, Tracy; Marra, Marco; Friedman, Jan M

    2009-08-01

    Massively parallel sequencing has reduced the cost and increased the throughput of genomic sequencing by more than three orders of magnitude, and it seems likely that costs will fall and throughput improve even more in the next few years. Clinical use of massively parallel sequencing will provide a way to identify the cause of many diseases of unknown etiology through simultaneous screening of thousands of loci for pathogenic mutations and by sequencing biological specimens for the genomic signatures of novel infectious agents. In addition to providing these entirely new diagnostic capabilities, massively parallel sequencing may also replace arrays and Sanger sequencing in clinical applications where they are currently being used. Routine clinical use of massively parallel sequencing will require higher accuracy, better ways to select genomic subsets of interest, and improvements in the functionality, speed, and ease of use of data analysis software. In addition, substantial enhancements in laboratory computer infrastructure, data storage, and data transfer capacity will be needed to handle the extremely large data sets produced. Clinicians and laboratory personnel will require training to use the sequence data effectively, and appropriate methods will need to be developed to deal with the incidental discovery of pathogenic mutations and variants of uncertain clinical significance. Massively parallel sequencing has the potential to transform the practice of medical genetics and related fields, but the vast amount of personal genomic data produced will increase the responsibility of geneticists to ensure that the information obtained is used in a medically and socially responsible manner.

  14. Parallel Calculation of CCSDT and Mk-MRCCSDT Energies.

    PubMed

    Prochnow, Eric; Harding, Michael E; Gauss, Jürgen

    2010-08-10

    A scheme for the parallel calculation of energies at the coupled-cluster singles, doubles, and triples (CCSDT) level of theory, several approximate iterative CCSDT schemes (CCSDT-1a, CCSDT-1b, CCSDT-2, CCSDT-3, and CC3), and for the state-specific multireference coupled-cluster ansatz suggested by Mukherjee with a full treatment of triple excitations (Mk-MRCCSDT) is presented. The proposed scheme is based on the adaptation of a highly efficient serial coupled-cluster code leading to a communication-minimized implementation by parallelizing the time-determining steps. The parallel algorithm is tailored for affordable cluster architectures connected by standard communication networks such as Gigabit Ethernet. In this way, CCSDT and Mk-MRCCSDT computations become feasible even for larger molecular systems and basis sets. An analysis of the time-determining steps for CCSDT and Mk-MRCCSDT, namely the computation of the triple-excitation amplitudes and their individual contributions, is carried out. Benchmark calculations are presented for the N2O, ozone, and benzene molecules, proving that the parallelization of these steps is sufficient to obtain an efficient parallel scheme. A first application to the case of 2,6-pyridyne using a triple-ζ quality basis (222 basis functions) is presented demonstrating the efficiency of the current implementation.

  15. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1991-01-01

    The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.

  16. Parallel Performance Characterization of Columbia

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak

    2004-01-01

    Using a collection of benchmark problems of increasing levels of realism and computational effort, we will characterize the strengths and limitations of the 10,240 processor Columbia system to deliver supercomputing value to application scientists. Scientists need to be able to determine if and how they can utilize Columbia to carry extreme workloads, either in terms of ultra-large applications that cannot be run otherwise (capability), or in terms of very large ensembles of medium-scale applications to populate response matrices (capacity). We select existing application benchmarks that scale from a small number of processors to the entire machine, and that highlight different issues in running supercomputing-calss applicaions, such as the various types of memory access, file I/O, inter- and intra-node communications and parallelization paradigms. http://www.nas.nasa.gov/Software/NPB/

  17. Information hiding in parallel programs

    SciTech Connect

    Foster, I.

    1992-01-30

    A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.

  18. Embodied and Distributed Parallel DJing.

    PubMed

    Cappelen, Birgitta; Andersson, Anders-Petter

    2016-01-01

    Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things.

  19. Parallel spinors on flat manifolds

    NASA Astrophysics Data System (ADS)

    Sadowski, Michał

    2006-05-01

    Let p(M) be the dimension of the vector space of parallel spinors on a closed spin manifold M. We prove that every finite group G is the holonomy group of a closed flat spin manifold M(G) such that p(M(G))>0. If the holonomy group Hol(M) of M is cyclic, then we give an explicit formula for p(M) another than that given in [R.J. Miatello, R.A. Podesta, The spectrum of twisted Dirac operators on compact flat manifolds, Trans. Am. Math. Soc., in press]. We answer the question when p(M)>0 if Hol(M) is a cyclic group of prime order or dim⁡M≤4.

  20. Device for balancing parallel strings

    DOEpatents

    Mashikian, Matthew S.

    1985-01-01

    A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.

  1. Parallel network simulations with NEURON.

    PubMed

    Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

    2006-10-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.

  2. Switching of a TEA CO(2) laser with its own UV emitting parallel spark channels.

    PubMed

    Nilaya, J P; Raote, Pallavi; Patil, Gautam; Biswas, Dhruba J

    2007-01-08

    The efficient operation of a TEA CO(2) laser wherein the parallel spark channel preioniser of the laser itself functioned as a switch is reported. Simultaneous closure of the parallel gaps without an external switch has been achieved by ballasting them with mutually coupled inductances. The repetitive operation capability of such a laser is also discussed.

  3. Anisotropic behaviour of transmission through thin superconducting NbN film in parallel magnetic field

    NASA Astrophysics Data System (ADS)

    Šindler, M.; Tesař, R.; Koláček, J.; Skrbek, L.

    2017-02-01

    Transmission of terahertz waves through a thin layer of the superconductor NbN deposited on an anisotropic R-cut sapphire substrate is studied as a function of temperature in a magnetic field oriented parallel with the sample. A significant difference is found between transmitted intensities of beams linearly polarised parallel with and perpendicular to the direction of applied magnetic field.

  4. Parallel computing in enterprise modeling.

    SciTech Connect

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  5. Massively parallel hypercube FFTs: CM-2 implementation and error analysis of a parallel trigonometric factor generation method

    SciTech Connect

    Tong, C.H. ); Swarztrauber, P.N. )

    1991-08-01

    On parallel computers, the way the data elements are mapped to the processors may have a large effect on the timing performance of a given algorithm. In our previous paper, we have examined a few mapping strategies for the ordered radix-2 DIF (decimation-in-frequency) Fast Fourier Transform. In particular, we have shown how reduction of communication can be achieved by combining the order and computational phases through the use of i-cycles. A parallel methods was also presented for computing the trigonometric factors which requires neither trigonometric function evaluation nor interprocessor communication. This paper first reviews some of the experimental results on the Connection Machine to demonstrate the importance of reducing communication in a parallel algorithm. The emphasis of this paper, however, is on analyzing the numerical stability of the proposed method for generating the trigonometric factors and showing how the error can be improved. 16 refs., 12 tabs.

  6. APPSPACK 4.0 : asynchronous parallel pattern search for derivative-free optimization.

    SciTech Connect

    Gray, Genetha Anne; Kolda, Tamara Gibson

    2004-12-01

    APPSPACK is software for solving unconstrained and bound constrained optimization problems. It implements an asynchronous parallel pattern search method that has been specifically designed for problems characterized by expensive function evaluations. Using APPSPACK to solve optimization problems has several advantages: No derivative information is needed; the procedure for evaluating the objective function can be executed via a separate program or script; the code can be run in serial or parallel, regardless of whether or not the function evaluation itself is parallel; and the software is freely available. We describe the underlying algorithm, data structures, and features of APPSPACK version 4.0 as well as how to use and customize the software.

  7. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated

  8. The design and implementation of MPI master-slave parallel genetic algorithm

    NASA Astrophysics Data System (ADS)

    Liu, Shuping; Cheng, Yanliu

    2013-03-01

    In this paper, the MPI master-slave parallel genetic algorithm is implemented by analyzing the basic genetic algorithm and parallel MPI program, and building a Linux cluster. This algorithm is used for the test of maximum value problems (Rosen brocks function) .And we acquire the factors influencing the master-slave parallel genetic algorithm by deriving from the analysis of test data. The experimental data shows that the balanced hardware configuration and software design optimization can improve the performance of system in the complexity of the computing environment using the master-slave parallel genetic algorithms.

  9. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  10. Parallel processing considerations for image recognition tasks

    NASA Astrophysics Data System (ADS)

    Simske, Steven J.

    2011-01-01

    Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.

  11. [Parallelism and paradoxes on viability and the life span of two loss-of-function mutations: heat shock protein transcriptional regulator hsf(1) and l(2)gl tumor suppressor in D. melanogaster].

    PubMed

    Vaĭsman, N Ia; Evgen'ev, M B; Golubovskiĭ, M D

    2012-01-01

    In this study we analyzed how a dosage decrease in mono- and diheterozygotes on both lethal alleles of the lgl-gene and hsfheat shock regulator influences viability and life span at optimal and high temperature 29 degrees C conditions. We found that hsf(1) (1 dosage of active hsf-factor) heterozygote animals had significantly increased viability (up to 30-39%) in case of its development from egg to imago under the stress of 29 degrees C. However, this stress-protective effect of a decreased dosage of HSF1 was suppressed in diheterozygotes, while the dosage of tumor suppressor lgl was simultaneously decreased. Under stress temperature conditions, a decrease in dosage of one of the alleles also increased the average life span and delayed aging, especially in the case of maternal inheritance of each of the loss-of-function mutations. In diheterozygotes the average life span had intermediate meanings. However, in diheterozygote males under stress conditions the positive longevity effect of hsf was suppressed in the presence of the lgl-mutation. Paradoxically, that decrease of expression of each of the studied vital genes provided a positive effect on both viability and life span under stress conditions. However, a simultaneous dosage decrease of two loss-of-function alleles in diheterozygotes resulted in disbalanced effects on the organism level. The received data indicate interaction between HSF1 and LGL gene products during ontogenesis and stress-defending processes.

  12. Parallel CARLOS-3D code development

    SciTech Connect

    Putnam, J.M.; Kotulski, J.D.

    1996-02-01

    CARLOS-3D is a three-dimensional scattering code which was developed under the sponsorship of the Electromagnetic Code Consortium, and is currently used by over 80 aerospace companies and government agencies. The code has been extensively validated and runs on both serial workstations and parallel super computers such as the Intel Paragon. CARLOS-3D is a three-dimensional surface integral equation scattering code based on a Galerkin method of moments formulation employing Rao- Wilton-Glisson roof-top basis for triangular faceted surfaces. Fully arbitrary 3D geometries composed of multiple conducting and homogeneous bulk dielectric materials can be modeled. This presentation describes some of the extensions to the CARLOS-3D code, and how the operator structure of the code facilitated these improvements. Body of revolution (BOR) and two-dimensional geometries were incorporated by simply including new input routines, and the appropriate Galerkin matrix operator routines. Some additional modifications were required in the combined field integral equation matrix generation routine due to the symmetric nature of the BOR and 2D operators. Quadrilateral patched surfaces with linear roof-top basis functions were also implemented in the same manner. Quadrilateral facets and triangular facets can be used in combination to more efficiently model geometries with both large smooth surfaces and surfaces with fine detail such as gaps and cracks. Since the parallel implementation in CARLOS-3D is at high level, these changes were independent of the computer platform being used. This approach minimizes code maintenance, while providing capabilities with little additional effort. Results are presented showing the performance and accuracy of the code for some large scattering problems. Comparisons between triangular faceted and quadrilateral faceted geometry representations will be shown for some complex scatterers.

  13. PREFACE: INERA Workshop: Transition Metal Oxide Thin Films-functional Layers in "Smart windows" and Water Splitting Devices. Parallel session of the 18th International School on Condensed Matter Physics

    NASA Astrophysics Data System (ADS)

    2014-11-01

    The Special issue presents the papers for the INERA Workshop entitled "Transition Metal Oxides as Functional Layers in Smart windows and Water Splitting Devices", which was held in Varna, St. Konstantin and Elena, Bulgaria, from the 4th-6th September 2014. The Workshop is organized within the context of the INERA "Research and Innovation Capacity Strengthening of ISSP-BAS in Multifunctional Nanostructures", FP7 Project REGPOT 316309 program, European project of the Institute of Solid State Physics at the Bulgarian Academy of Sciences. There were 42 participants at the workshop, 16 from Sweden, Germany, Romania and Hungary, 11 invited lecturers, and 28 young participants. There were researchers present from prestigious European laboratories which are leaders in the field of transition metal oxide thin film technologies. The event contributed to training young researchers in innovative thin film technologies, as well as thin films characterization techniques. The topics of the Workshop cover the field of technology and investigation of thin oxide films as functional layers in "Smart windows" and "Water splitting" devices. The topics are related to the application of novel technologies for the preparation of transition metal oxide films and the modification of chromogenic properties towards the improvement of electrochromic and termochromic device parameters for possible industrial deployment. The Workshop addressed the following topics: Metal oxide films-functional layers in energy efficient devices; Photocatalysts and chemical sensing; Novel thin film technologies and applications; Methods of thin films characterizations; From the 37 abstracts sent, 21 manuscripts were written and later refereed. We appreciate the comments from all the referees, and we are grateful for their valuable contributions. Guest Editors: Assoc. Prof. Dr.Tatyana Ivanova Prof. DSc Kostadinka Gesheva Prof. DSc Hassan Chamatti Assoc. Prof. Dr. Georgi Popkirov Workshop Organizing Committee Prof

  14. Measurements of parallel electron velocity distributions using whistler wave absorption

    SciTech Connect

    Thuecks, D. J.; Skiff, F.; Kletzing, C. A.

    2012-08-15

    We describe a diagnostic to measure the parallel electron velocity distribution in a magnetized plasma that is overdense ({omega}{sub pe} > {omega}{sub ce}). This technique utilizes resonant absorption of whistler waves by electrons with velocities parallel to a background magnetic field. The whistler waves were launched and received by a pair of dipole antennas immersed in a cylindrical discharge plasma at two positions along an axial background magnetic field. The whistler wave frequency was swept from somewhat below and up to the electron cyclotron frequency {omega}{sub ce}. As the frequency was swept, the wave was resonantly absorbed by the part of the electron phase space density which was Doppler shifted into resonance according to the relation {omega}-k{sub ||v||} = {omega}{sub ce}. The measured absorption is directly related to the reduced parallel electron distribution function integrated along the wave trajectory. The background theory and initial results from this diagnostic are presented here. Though this diagnostic is best suited to detect tail populations of the parallel electron distribution function, these first results show that this diagnostic is also rather successful in measuring the bulk plasma density and temperature both during the plasma discharge and into the afterglow.

  15. Towards Distributed Memory Parallel Program Analysis

    SciTech Connect

    Quinlan, D; Barany, G; Panas, T

    2008-06-17

    This paper presents a parallel attribute evaluation for distributed memory parallel computer architectures where previously only shared memory parallel support for this technique has been developed. Attribute evaluation is a part of how attribute grammars are used for program analysis within modern compilers. Within this work, we have extended ROSE, a open compiler infrastructure, with a distributed memory parallel attribute evaluation mechanism to support user defined global program analysis required for some forms of security analysis which can not be addressed by a file by file view of large scale applications. As a result, user defined security analyses may now run in parallel without the user having to specify the way data is communicated between processors. The automation of communication enables an extensible open-source parallel program analysis infrastructure.

  16. Runtime volume visualization for parallel CFD

    NASA Technical Reports Server (NTRS)

    Ma, Kwan-Liu

    1995-01-01

    This paper discusses some aspects of design of a data distributed, massively parallel volume rendering library for runtime visualization of parallel computational fluid dynamics simulations in a message-passing environment. Unlike the traditional scheme in which visualization is a postprocessing step, the rendering is done in place on each node processor. Computational scientists who run large-scale simulations on a massively parallel computer can thus perform interactive monitoring of their simulations. The current library provides an interface to handle volume data on rectilinear grids. The same design principles can be generalized to handle other types of grids. For demonstration, we run a parallel Navier-Stokes solver making use of this rendering library on the Intel Paragon XP/S. The interactive visual response achieved is found to be very useful. Performance studies show that the parallel rendering process is scalable with the size of the simulation as well as with the parallel computer.

  17. Parallel reactor systems for bioprocess development.

    PubMed

    Weuster-Botz, Dirk

    2005-01-01

    Controlled parallel bioreactor systems allow fed-batch operation at early stages of process development. The characteristics of shaken bioreactors operated in parallel (shake flask, microtiter plate), sparged bioreactors (small-scale bubble column) and stirred bioreactors (stirred-tank, stirred column) are briefly summarized. Parallel fed-batch operation is achieved with an intermittent feeding and pH-control system for up to 16 bioreactors operated in parallel on a scale of 100 ml. Examples of the scale-up and scale-down of pH-controlled microbial fed-batch processes demonstrate that controlled parallel reactor systems can result in more effective bioprocess development. Future developments are also outlined, including units of 48 parallel stirred-tank reactors with individual pH- and pO2-controls and automation as well as liquid handling system, operated on a scale of ml.

  18. Design considerations for parallel graphics libraries

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  19. Parallel optimization methods for agile manufacturing

    SciTech Connect

    Meza, J.C.; Moen, C.D.; Plantenga, T.D.; Spence, P.A.; Tong, C.H.; Hendrickson, B.A.; Leland, R.W.; Reese, G.M.

    1997-08-01

    The rapid and optimal design of new goods is essential for meeting national objectives in advanced manufacturing. Currently almost all manufacturing procedures involve the determination of some optimal design parameters. This process is iterative in nature and because it is usually done manually it can be expensive and time consuming. This report describes the results of an LDRD, the goal of which was to develop optimization algorithms and software tools that will enable automated design thereby allowing for agile manufacturing. Although the design processes vary across industries, many of the mathematical characteristics of the problems are the same, including large-scale, noisy, and non-differentiable functions with nonlinear constraints. This report describes the development of a common set of optimization tools using object-oriented programming techniques that can be applied to these types of problems. The authors give examples of several applications that are representative of design problems including an inverse scattering problem, a vibration isolation problem, a system identification problem for the correlation of finite element models with test data and the control of a chemical vapor deposition reactor furnace. Because the function evaluations are computationally expensive, they emphasize algorithms that can be adapted to parallel computers.

  20. Multi-petascale highly efficient parallel supercomputer

    DOEpatents

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  1. Parallel Quantum Circuit in a Tunnel Junction

    PubMed Central

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-01-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N). PMID:27453262

  2. Parallel Quantum Circuit in a Tunnel Junction

    NASA Astrophysics Data System (ADS)

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-07-01

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N).

  3. Parallel Quantum Circuit in a Tunnel Junction.

    PubMed

    Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

    2016-07-25

    Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N(2) power law is preserved for Ωab(N) and for Vab(N).

  4. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  5. Inverse Kinematics for a Parallel Myoelectric Elbow

    DTIC Science & Technology

    2001-10-25

    Inverse Kinematics for a Parallel Myoelectric Elbow A. Z. Escudero, Ja. Álvarez, L. Leija. Center of Research and Advanced Studies of the IPN...replacement above elbow are serial mechanisms driven by a DC motor and they include only one active articulation for the elbow [1]. Parallel mechanisms...are rather scarce [2]. The inverse kinematics model of a 3-degree of freedom parallel prosthetic elbow mechanism is reported. The mathematical

  6. Parallel computations and control of adaptive structures

    NASA Technical Reports Server (NTRS)

    Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

    1991-01-01

    The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.

  7. Parallel auto-correlative statistics with VTK.

    SciTech Connect

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  8. Parallel programming in Split-C

    SciTech Connect

    Culler, D.E.; Dusseau, A.; Goldstein, S.C.; Krishnamurthy, A.; Lumetta, S.; Eicken, T. von; Yelick, K.

    1993-12-31

    The authors introduce the Split-C language, a parallel extension of C intended for high performance programming on distributed memory multiprocessors, and demonstrate the use of the language in optimizing parallel programs. Split-C provides a global address space with a clear concept of locality and unusual assignment operators. These are used as tools to reduce the frequency and cost of remote access. The language allows a mixture of shared memory, message passing, and data parallel programming styles while providing efficient access to the underlying machine. They demonstrate the basic language concepts using regular and irregular parallel programs and give performance results for various stages of program optimization.

  9. Shared-memory parallel programming in C++

    SciTech Connect

    Beck, B. )

    1990-07-01

    This paper discusses how researchers have produced a set of portable parallel-programming constructs for C, implemented in M4 macros. These parallel-programming macros are available under the name Parmacs. The Parmacs macros let one write parallel C programs for shared-memory, distributed-memory, and mixed-memory (shared and distributed) systems. They have been implemented on several machines. Because Parmacs offers useful parallel-programming features, the author has considered how these problems might be overcome or avoided. The author thought that using C++, rather than C, would address these problems adequately, and describes the C++ features exploited. The work described addresses shared-memory constructs.

  10. Parallel Algorithms for the Exascale Era

    SciTech Connect

    Robey, Robert W.

    2016-10-19

    New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.

  11. A parallel algorithm for global routing

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall J.; Banerjee, Prithviraj

    1990-01-01

    A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.

  12. Conformal pure radiation with parallel rays

    NASA Astrophysics Data System (ADS)

    Leistner, Thomas; Nurowski, Paweł

    2012-03-01

    We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves.

  13. Parallel Genetic Algorithm for Alpha Spectra Fitting

    NASA Astrophysics Data System (ADS)

    García-Orellana, Carlos J.; Rubio-Montero, Pilar; González-Velasco, Horacio

    2005-01-01

    We present a performance study of alpha-particle spectra fitting using parallel Genetic Algorithm (GA). The method uses a two-step approach. In the first step we run parallel GA to find an initial solution for the second step, in which we use Levenberg-Marquardt (LM) method for a precise final fit. GA is a high resources-demanding method, so we use a Beowulf cluster for parallel simulation. The relationship between simulation time (and parallel efficiency) and processors number is studied using several alpha spectra, with the aim of obtaining a method to estimate the optimal processors number that must be used in a simulation.

  14. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1995-01-01

    The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.

  15. Parallel Computing for Brain Simulation.

    PubMed

    Pastur-Romay, L A; Porto-Pazos, A B; Cedrón, F; Pazos, A

    2016-11-04

    The human brain is the most complex system in the known universe, but it is the most unknown system. It allows the human beings to possess extraordinary capacities. However, we don´t understand yet how and why most of these capacities are produced. For decades, it have been tried that the computers reproduces these capacities. On one hand, to help understanding the nervous system. On the other hand, to process the data in a more efficient way than before. It is intended to make the computers process the information like the brain does it. The important technological developments and the big multidisciplinary projects have allowed create the first simulation with a number of neurons similar to the human brain neurons number. This paper presents an update review about the main research projects that are trying of simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the actual applications of these works and also the future trends. We have reviewed some works that look for a step forward in Neuroscience and other ones that look for a breakthrough in Computer Science (neuromorphic hardware, machine learning techniques). We summarize the most outstanding characteristics of them and present the latest advances and future plans. In addition, this review remarks the importance of considering not only neurons: the computational models of the brain should include glial cells, given the proven importance of the astrocytes in the information processing.

  16. Parallel methods for the flight simulation model

    SciTech Connect

    Xiong, Wei Zhong; Swietlik, C.

    1994-06-01

    The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.

  17. Recent Improvements to HST Parallel Scheduling

    NASA Astrophysics Data System (ADS)

    Henry, Ronald; Butschky, Mike

    The Hubble Space Telescope (HST) has several scientific instruments (SIs) that may be used at any given time. Most primary visits submitted by HST observers only use one SI, leaving the other SIs free to be requested by ``pure parallel'' observing programs. In order to accomplish this, separate scheduling units (SUs) for each parallel SI must be created and then scheduled by the Science Planning and Scheduling System (SPSS), taking into account numerous orbital and scientific constraints. The Parallel Observation Matching System (POMS) has the task of matching parallel visits to primary observations and ``crafting'' appropriate parallel SUs at each opportunity, taking scientific criteria and orbital constraints into account. The process for planning and scheduling parallel observations is thus quite different from the process for primary science. In the past, custom crafting rules for each parallel program were necessary, requiring full-time support from a software developer. In addition, because POMS ran as a standalone system, its ability to model how long parallel SUs would take was limited, especially with the flexible buffer-management schemes used for the second-generation SIs. A new version of POMS was developed in 1997. This version uses a formal proposal syntax (the same used for primary observations) for parallels, so that different proposals can be handled uniformly and without the need for customized ``crafting rules.'' In addition, POMS is integrated with the Transformation (TRANS) planning system in order to give it full knowledge of overheads within an SU, eliminating the need for ad hoc modeling. The power and versatility of this approach has paid off in improved utilization of parallel opportunities, greatly reduced maintenance costs, and an ability to gracefully handle new parallel proposals and new SIs with minimal software effort. This paper discusses the requirements, design, and operational results of the new POMS.

  18. Particle Transport in Parallel-Plate Reactors

    SciTech Connect

    Rader, D.J.; Geller, A.S.

    1999-08-01

    A major cause of semiconductor yield degradation is contaminant particles that deposit on wafers while they reside in processing tools during integrated circuit manufacturing. This report presents numerical models for assessing particle transport and deposition in a parallel-plate geometry characteristic of a wide range of single-wafer processing tools: uniform downward flow exiting a perforated-plate showerhead separated by a gap from a circular wafer resting on a parallel susceptor. Particles are assumed to originate either upstream of the showerhead or from a specified position between the plates. The physical mechanisms controlling particle deposition and transport (inertia, diffusion, fluid drag, and external forces) are reviewed, with an emphasis on conditions encountered in semiconductor process tools (i.e., sub-atmospheric pressures and submicron particles). Isothermal flow is assumed, although small temperature differences are allowed to drive particle thermophoresis. Numerical solutions of the flow field are presented which agree with an analytic, creeping-flow expression for Re < 4. Deposition is quantified by use of a particle collection efficiency, which is defined as the fraction of particles in the reactor that deposit on the wafer. Analytic expressions for collection efficiency are presented for the limiting case where external forces control deposition (i.e., neglecting particle diffusion and inertia). Deposition from simultaneous particle diffusion and external forces is analyzed by an Eulerian formulation; for creeping flow and particles released from a planar trap, the analysis yields an analytic, integral expression for particle deposition based on process and particle properties. Deposition from simultaneous particle inertia and external forces is analyzed by a Lagrangian formulation, which can describe inertia-enhanced deposition resulting from particle acceleration in the showerhead. An approximate analytic expression is derived for particle

  19. A component analysis based on serial results analyzing performance of parallel iterative programs

    SciTech Connect

    Richman, S.C.

    1994-12-31

    This research is concerned with the parallel performance of iterative methods for solving large, sparse, nonsymmetric linear systems. Most of the iterative methods are first presented with their time costs and convergence rates examined intensively on sequential machines, and then adapted to parallel machines. The analysis of the parallel iterative performance is more complicated than that of serial performance, since the former can be affected by many new factors, such as data communication schemes, number of processors used, and Ordering and mapping techniques. Although the author is able to summarize results from data obtained after examining certain cases by experiments, two questions remain: (1) How to explain the results obtained? (2) How to extend the results from the certain cases to general cases? To answer these two questions quantitatively, the author introduces a tool called component analysis based on serial results. This component analysis is introduced because the iterative methods consist mainly of several basic functions such as linked triads, inner products, and triangular solves, which have different intrinsic parallelisms and are suitable for different parallel techniques. The parallel performance of each iterative method is first expressed as a weighted sum of the parallel performance of the basic functions that are the components of the method. Then, one separately examines the performance of basic functions and the weighting distributions of iterative methods, from which two independent sets of information are obtained when solving a given problem. In this component approach, all the weightings require only serial costs not parallel costs, and each iterative method for solving a given problem is represented by its unique weighting distribution. The information given by the basic functions is independent of iterative method, while that given by weightings is independent of parallel technique, parallel machine and number of processors.

  20. When the Lowest Energy Does Not Induce Native Structures: Parallel Minimization of Multi-Energy Values by Hybridizing Searching Intelligences

    PubMed Central

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708

  1. High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

    ERIC Educational Resources Information Center

    von Davier, Matthias

    2016-01-01

    This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…

  2. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  3. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    SciTech Connect

    Lu Liuyan Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-08-20

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f{sub m}pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive

  4. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    NASA Astrophysics Data System (ADS)

    Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.

    2009-08-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  5. Prototyping Parallel and Distributed Programs in Proteus

    DTIC Science & Technology

    1990-10-01

    Cole90, Gibb89]. " Highly-parallel processors - Applications for highly-parallel machines such as the CM- 2 or the iPSC are programmed using data...Programming, (Prentice-Hall, Englewood Cliffs, NJ) 1990. [Gibb89] Gibbons , P.B., "A more practical PRAM model", in: Proceedings of the First ACM

  6. Simulation Exploration through Immersive Parallel Planes: Preprint

    SciTech Connect

    Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve

    2016-03-01

    We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.

  7. Parallel computation with the spectral element method

    SciTech Connect

    Ma, Hong

    1995-12-01

    Spectral element models for the shallow water equations and the Navier-Stokes equations have been successfully implemented on a data parallel supercomputer, the Connection Machine model CM-5. The nonstaggered grid formulations for both models are described, which are shown to be especially efficient in data parallel computing environment.

  8. Predicting Protein Structure Using Parallel Genetic Algorithms.

    DTIC Science & Technology

    1994-12-01

    By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H...iiLite-d Approved for public release; distribution unlimited AFIT/ GCS /ENG/94D-03 Predicting Protein Structure Using Parallel Genetic Algorithms ...1-1 1.2 Genetic Algorithms ......... ............................ 1-3 1.3 The Protein Folding Problem

  9. Parallel Activation in Bilingual Phonological Processing

    ERIC Educational Resources Information Center

    Lee, Su-Yeon

    2011-01-01

    In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…

  10. Serial Order: A Parallel Distributed Processing Approach.

    ERIC Educational Resources Information Center

    Jordan, Michael I.

    Human behavior shows a variety of serially ordered action sequences. This paper presents a theory of serial order which describes how sequences of actions might be learned and performed. In this theory, parallel interactions across time (coarticulation) and parallel interactions across space (dual-task interference) are viewed as two aspects of a…

  11. MULTIOBJECTIVE PARALLEL GENETIC ALGORITHM FOR WASTE MINIMIZATION

    EPA Science Inventory

    In this research we have developed an efficient multiobjective parallel genetic algorithm (MOPGA) for waste minimization problems. This MOPGA integrates PGAPack (Levine, 1996) and NSGA-II (Deb, 2000) with novel modifications. PGAPack is a master-slave parallel implementation of a...

  12. Parallel Narrative Structure in Paul Harding's "Tinkers"

    ERIC Educational Resources Information Center

    Çirakli, Mustafa Zeki

    2014-01-01

    The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…

  13. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  14. Parallel unstructured grid generation for computational aerosciences

    NASA Technical Reports Server (NTRS)

    Shephard, Mark S.

    1993-01-01

    The objective of this research project is to develop efficient parallel automatic grid generation procedures for use in computational aerosciences. This effort is focused on a parallel version of the Finite Octree grid generator. Progress made during the first six months is reported.

  15. A hadron calorimeter with scintillators parallel to the beam

    NASA Astrophysics Data System (ADS)

    Abramov, V.; Goncharov, P.; Gorin, A.; Gurzhiev, A.; Dyshkant, A.; Evdokimov, V.; Kolosov, V.; Korablev, A.; Korneev, Yu.; Kostritskii, A.; Krinitsyn, A.; Kryshkin, V.; Podstavkov, V.; Polyakov, V.; Shtannikov, A.; Tereschenko, S.; Turchanovich, L.; Zaichenko, A.

    1997-02-01

    A hadron calorimeter in which scintillators are arranged nearly parallel to the incident particle direction and light is collected by optical fibres with WLS, has been built. The iron absorber plates are of the tapered shape to fit a barrel structure of the collider geometry. The performance of the calorimeter studied with hadron beam is presented as a function of tilt angle without and with electromagnetic calorimeter in front of the hadron one.

  16. Open parabosonic string theory between two parallel Dp-branes

    SciTech Connect

    Hamam, D.; Belaloui, N.

    2012-06-27

    We investigate an open parabosonic string theory between two parallel Dp-branes. The spectrum is constructed and the partition function is derived. A common chord between the development of this latter and the degeneracy of the states for each mass level is obtained. The theory is consistent and with no tachyon. The Virasoro algebra is derived and compared to the one of the ordinary case.

  17. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  18. TSE computers - A means for massively parallel computations

    NASA Technical Reports Server (NTRS)

    Strong, J. P., III

    1976-01-01

    A description is presented of hardware concepts for building a massively parallel processing system for two-dimensional data. The processing system is to use logic arrays of 128 x 128 elements which perform over 16 thousand operations simultaneously. Attention is given to image data, logic arrays, basic image logic functions, a prototype negator, an interleaver device, image logic circuits, and an image memory circuit.

  19. Differences Between Distributed and Parallel Systems

    SciTech Connect

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  20. Parallel-In-Time For Moving Meshes

    SciTech Connect

    Falgout, R. D.; Manteuffel, T. A.; Southworth, B.; Schroder, J. B.

    2016-02-04

    With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.

  1. Configuration space representation in parallel coordinates

    NASA Technical Reports Server (NTRS)

    Fiorini, Paolo; Inselberg, Alfred

    1989-01-01

    By means of a system of parallel coordinates, a nonprojective mapping from R exp N to R squared is obtained for any positive integer N. In this way multivariate data and relations can be represented in the Euclidean plane (embedded in the projective plane). Basically, R squared with Cartesian coordinates is augmented by N parallel axes, one for each variable. The N joint variables of a robotic device can be represented graphically by using parallel coordinates. It is pointed out that some properties of the relation are better perceived visually from the parallel coordinate representation, and that new algorithms and data structures can be obtained from this representation. The main features of parallel coordinates are described, and an example is presented of their use for configuration space representation of a mechanical arm (where Cartesian coordinates cannot be used).

  2. Implementation and performance of parallel Prolog interpreter

    SciTech Connect

    Wei, S.; Kale, L.V.; Balkrishna, R. . Dept. of Computer Science)

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  3. Parallel Algebraic Multigrid Methods - High Performance Preconditioners

    SciTech Connect

    Yang, U M

    2004-11-11

    The development of high performance, massively parallel computers and the increasing demands of computationally challenging applications have necessitated the development of scalable solvers and preconditioners. One of the most effective ways to achieve scalability is the use of multigrid or multilevel techniques. Algebraic multigrid (AMG) is a very efficient algorithm for solving large problems on unstructured grids. While much of it can be parallelized in a straightforward way, some components of the classical algorithm, particularly the coarsening process and some of the most efficient smoothers, are highly sequential, and require new parallel approaches. This chapter presents the basic principles of AMG and gives an overview of various parallel implementations of AMG, including descriptions of parallel coarsening schemes and smoothers, some numerical results as well as references to existing software packages.

  4. National Combustion Code: Parallel Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

    2000-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.

  5. Parallelization of a Compositional Reservoir Simulator

    NASA Astrophysics Data System (ADS)

    Reme, Hilde; Åge Øye, Geir; Espedal, Magne S.; Fladmark, Gunnar E.

    A finite volume dicretization has been used to solve compositional flow in porous media. Secondary migration in fractured rocks has been the main motivation for the work. Multipoint flux approximation has been implemented and adaptive local grid refinement, based on domain decomposition, is used at fractures and faults. The parallelization method, which is described in this paper, strongly promotes code reuse and gives a very high level of parallelization despite low implementation costs. The programming framework is also portable to other platforms or other applications. We have presented computer experiments to examine the parallel efficiency of the implemented parallel simulator with respect to scalability and speedup. Keywords: porous media, multipoint flux approximation, domain decomposition, parallelization

  6. Genetic Parallel Programming: design and implementation.

    PubMed

    Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

    2006-01-01

    This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

  7. Parallel hypergraph partitioning for scientific computing.

    SciTech Connect

    Heaphy, Robert; Devine, Karen Dragon; Catalyurek, Umit; Bisseling, Robert; Hendrickson, Bruce Alan; Boman, Erik Gunnar

    2005-07-01

    Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster.

  8. Broadcasting a message in a parallel computer

    DOEpatents

    Berg, Jeremy E.; Faraj, Ahmad A.

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  9. Sequential bioequivalence approaches for parallel designs.

    PubMed

    Fuglsang, Anders

    2014-05-01

    Regulators in EU, USA and Canada allow the use of two-stage approaches for evaluation of bioequivalence. The purpose of this paper is to evaluate such designs for parallel groups using trial simulations. The methods developed by Diane Potvin and co-workers were adapted to parallel designs. Trials were simulated and evaluated on basis of either equal or unequal variances between treatment groups. Methods B and C of Potvin et al., when adapted for parallel designs, protected well against type I error rate inflation under all of the simulated scenarios. Performance characteristics of the new parallel design methods showed little dependence on the assumption of equality of the test and reference variances. This is the first paper to describe the performance of two-stage approaches for parallel designs used to evaluate bioequivalence. The results may prove useful to sponsors developing formulations where crossover designs for bioequivalence evaluation are undesirable.

  10. Parallel algorithms for interactive manipulation of digital terrain models

    NASA Technical Reports Server (NTRS)

    Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

    1988-01-01

    Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.

  11. Parallel hybrid algorithm for solution in electrical impedance equation

    NASA Astrophysics Data System (ADS)

    Ponomaryov, Volodymyr; Robles-Gonzalez, Marco; Bucio-Ramirez, Ariana; Ramirez-Tachiquin, Marco; Ramos-Diaz, Eduardo

    2015-02-01

    This work is dedicated to the analysis of the forward and the inverse problem to obtain a better approximation to the Electrical Impedance Tomography equation. In this case, we employ for the forward problem the numerical method based on the Taylor series in formal power and for the inverse problem the Finite Element Method. For the analysis of the forward problem, we proposed a novel algorithm, which employs a regularization technique for the stability, additionally the parallel computing is used to obtain the solution faster; this modification permits to obtain an efficient solution of the forward problem. Then, the found solution is used in the inverse problem for the approximation employing the Finite Element Method. The algorithms employed in this work are developed in structural programming paradigm in C++, including parallel processing; the time run analysis is performed only in the forward problem because the Finite Element Method due to their high recursive does not accept parallelism. Some examples are performed for this analysis, in which several conductivity functions are employed for two different cases: for the analytical cases: the exponential and sinusoidal functions are used, and for the geometrical cases the circle at center and five disk structure are revised as conductivity functions. The Lebesgue measure is used as metric for error estimation in the forward problem, meanwhile, in the inverse problem PSNR, SSIM, MSE criteria are applied, to determine the convergence of both methods.

  12. p-MEMPSODE: Parallel and irregular memetic global optimization

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Parsopoulos, K. E.; Papageorgiou, D. G.; Lagaris, I. E.; Vrahatis, M. N.

    2015-12-01

    A parallel memetic global optimization algorithm suitable for shared memory multicore systems is proposed and analyzed. The considered algorithm combines two well-known and widely used population-based stochastic algorithms, namely Particle Swarm Optimization and Differential Evolution, with two efficient and parallelizable local search procedures. The sequential version of the algorithm was first introduced as MEMPSODE (MEMetic Particle Swarm Optimization and Differential Evolution) and published in the CPC program library. We exploit the inherent and highly irregular parallelism of the memetic global optimization algorithm by means of a dynamic and multilevel approach based on the OpenMP tasking model. In our case, tasks correspond to local optimization procedures or simple function evaluations. Parallelization occurs at each iteration step of the memetic algorithm without affecting its searching efficiency. The proposed implementation, for the same random seed, reaches the same solution irrespectively of being executed sequentially or in parallel. Extensive experimental evaluation has been performed in order to illustrate the speedup achieved on a shared-memory multicore server.

  13. Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Tong, Charles; Swarztrauber, Paul N.

    1989-01-01

    Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.

  14. A portable MPI-based parallel vector template library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  15. A Portable MPI-Based Parallel Vector Template Library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  16. Instant well-log inversion with a parallel computer

    SciTech Connect

    Kimminau, S.J.; Trivedi, H.

    1993-08-01

    Well-log analysis requires several vectors of input data to be inverted with a physical model that produces more vectors of output data. The problem is inherently suited to either vectorization or parallelization. PLATO (parallel log analysis, timely output) is a research prototype system that uses a parallel architecture computer with memory-mapped graphics to invert vector data and display the result rapidly. By combining this high-performance computing and display system with a graphical user interface, the analyst can interact with the system in real time'' and can visualize the result of changing parameters on up to 1,000 levels of computed volumes and reconstructed logs. It is expected that such instant'' inversion will remove the main disadvantages frequently cited for simultaneous analysis methods, namely difficulty in assessing sensitivity to different parameters and slow output response. Although the prototype system uses highly specific features of a parallel processor, a subsequent version has been implemented on a conventional (Serial) workstation with less performance but adequate functionality to preserve the apparently instant response. PLATO demonstrates the feasibility of petroleum computing applications combining an intuitive graphical interface, high-performance computing of physical models, and real-time output graphics.

  17. Parallel heat transport in integrable and chaotic magnetic fields

    SciTech Connect

    Del-Castillo-Negrete, Diego B; Chacon, Luis

    2012-01-01

    The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly chal- lenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), , and the perpendicular, , conductivities ( / may exceed 1010 in fusion plasmas); (ii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates; and (iii) Nonlocal parallel transport in the limit of small collisionality. Motivated by these issues, we present a Lagrangian Green s function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geom- etry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island chain), weakly chaotic (devil s staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local closures, is non-diffusive, thus casting doubts on the appropriateness of the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.

  18. Parallel reservoir computing using optical amplifiers.

    PubMed

    Vandoorne, Kristof; Dambre, Joni; Verstraeten, David; Schrauwen, Benjamin; Bienstman, Peter

    2011-09-01

    Reservoir computing (RC), a computational paradigm inspired on neural systems, has become increasingly popular in recent years for solving a variety of complex recognition and classification problems. Thus far, most implementations have been software-based, limiting their speed and power efficiency. Integrated photonics offers the potential for a fast, power efficient and massively parallel hardware implementation. We have previously proposed a network of coupled semiconductor optical amplifiers as an interesting test case for such a hardware implementation. In this paper, we investigate the important design parameters and the consequences of process variations through simulations. We use an isolated word recognition task with babble noise to evaluate the performance of the photonic reservoirs with respect to traditional software reservoir implementations, which are based on leaky hyperbolic tangent functions. Our results show that the use of coherent light in a well-tuned reservoir architecture offers significant performance benefits. The most important design parameters are the delay and the phase shift in the system's physical connections. With optimized values for these parameters, coherent semiconductor optical amplifier (SOA) reservoirs can achieve better results than traditional simulated reservoirs. We also show that process variations hardly degrade the performance, but amplifier noise can be detrimental. This effect must therefore be taken into account when designing SOA-based RC implementations.

  19. Highly Parallel, High-Precision Numerical Integration

    SciTech Connect

    Bailey, David H.; Borwein, Jonathan M.

    2005-04-22

    This paper describes a scheme for rapidly computing numerical values of definite integrals to very high accuracy, ranging from ordinary machine precision to hundreds or thousands of digits, even for functions with singularities or infinite derivatives at endpoints. Such a scheme is of interest not only in computational physics and computational chemistry, but also in experimental mathematics, where high-precision numerical values of definite integrals can be used to numerically discover new identities. This paper discusses techniques for a parallel implementation of this scheme, then presents performance results for 1-D and 2-D test suites. Results are also given for a certain problem from mathematical physics, which features a difficult singularity, confirming a conjecture to 20,000 digit accuracy. The performance rate for this latter calculation on 1024 CPUs is 690 Gflop/s. We believe that this and one other 20,000-digit integral evaluation that we report are the highest-precision non-trivial numerical integrations performed to date.

  20. A parallel framework for Bayesian reinforcement learning

    NASA Astrophysics Data System (ADS)

    Barrett, Enda; Duggan, Jim; Howley, Enda

    2014-01-01

    Solving a finite Markov decision process using techniques from dynamic programming such as value or policy iteration require a complete model of the environmental dynamics. The distribution of rewards, transition probabilities, states and actions all need to be fully observable, discrete and complete. For many problem domains, a complete model containing a full representation of the environmental dynamics may not be readily available. Bayesian reinforcement learning (RL) is a technique devised to make better use of the information observed through learning than simply computing Q-functions. However, this approach can often require extensive experience in order to build up an accurate representation of the true values. To address this issue, this paper proposes a method for parallelising a Bayesian RL technique aimed at reducing the time it takes to approximate the missing model. We demonstrate the technique on learning next state transition probabilities without prior knowledge. The approach is general enough for approximating any probabilistically driven component of the model. The solution involves multiple learning agents learning in parallel on the same task. Agents share probability density estimates amongst each other in an effort to speed up convergence to the true values.

  1. Parallel Processing in Face Perception

    ERIC Educational Resources Information Center

    Martens, Ulla; Leuthold, Hartmut; Schweinberger, Stefan R.

    2010-01-01

    The authors examined face perception models with regard to the functional and temporal organization of facial identity and expression analysis. Participants performed a manual 2-choice go/no-go task to classify faces, where response hand depended on facial familiarity (famous vs. unfamiliar) and response execution depended on facial expression…

  2. [Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

    PubMed

    Furuta, Takuya; Sato, Tatsuhiko

    2015-01-01

    Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.

  3. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  4. Measurement of parallel ion energy distribution function in PISCES plasma

    SciTech Connect

    Tynan, G.R.; Goebel, D.M.; Conn, R.W.

    1987-08-01

    The PISCES facility is used to conduct controlled plasma-surface interaction experiments. Plasma parameters typical of those found in the edge plasmas of major fusion confinement experiments are produced. In this work, the energy distribution of the ion flux incident on a material surface is measured using a gridded energy analyzer in place of a material sample. The full width at half maximum energy distribution of the ion flux is found to vary from 10 eV to 30 eV both hydrogen and deuterium plasmas. Helium plasmas have a much lower FWHM energy spread than hydrogen and deuterium plasmas. The FWHM ion energy spread is found to be linearly related to the electron temperature. The most probable ion energy is found to be linearly related to the bias applied to the energy analyzer. Other plasma parameters have a weak influence upon the energy distribution of the ion flux. Two possible physical mechanisms for producing the observed results are introduced and suggestions for further work are made. The impact of the reported measurements on the materials experiments conducted in the PISCES facility are discussed and recommendations for future experiments are made. 11 refs., 13 figs.

  5. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  6. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  7. Improved packing of protein side chains with parallel ant colonies

    PubMed Central

    2014-01-01

    Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy

  8. Highly Parallel Modern Signal Processing.

    DTIC Science & Technology

    1982-02-28

    looked at the application of these techniques to systems with coherent speckle noise, such as synthetic aperature (SAR) imagery, coherent sonar and...pprtitioned matrix inversion , comput;atio-n o"f crossambigul ty fun~ctions, formation of outer prCdu1cL tAand skewed outer products, and multiplication of...operations are multiplication, inversion , and L-U decomposition. In signal processing such operations can be found in adaptive filtering, data

  9. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1997-01-01

    The multiblock reacting Navier-Stokes flow solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  10. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1994-01-01

    The multiblock reacting Navier-Stokes flow-solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  11. Knowledge representation into Ada parallel processing

    NASA Technical Reports Server (NTRS)

    Masotto, Tom; Babikyan, Carol; Harper, Richard

    1990-01-01

    The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.

  12. Time-parallel multiscale/multiphysics framework

    SciTech Connect

    Frantziskonis, G.; Muralidharan, Krishna; Deymier, Pierre; Simunovic, Srdjan; Nukala, Phani K; Pannala, Sreekanth

    2009-01-01

    We introduce the time-parallel compound wavelet matrix method (tpCWM) for modeling the temporal evolution of multiscale and multiphysics systems. The method couples time parallel (TP) and CWM methods operating at different spatial and temporal scales. We demonstrate the efficiency of our approach on two examples: a chemical reaction kinetic system and a non-linear predator prey system. Our results indicate that the tpCWM technique is capable of accelerating time-to-solution by 2 3-orders of magnitude and is amenable to efficient parallel implementation.

  13. Language constructs for modular parallel programs

    SciTech Connect

    Foster, I.

    1996-03-01

    We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.

  14. Distributed parallel messaging for multiprocessor systems

    SciTech Connect

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  15. Parallel path aspects of transmission modeling

    SciTech Connect

    Kavicky, J.A.; Shahidehpour, S.M.

    1996-11-01

    This paper examines the present methods and modeling techniques available to address the effects of parallel flows resulting from various firm and short-term energy transactions. A survey of significant methodologies is conducted to determine the present status of parallel flow transaction modeling. The strengths and weaknesses of these approaches are identified to suggest areas of further modeling improvements. The motivating force behind this research is to improve transfer capability assessment accuracy by suggesting a real-time modeling environment that adequately represents the influences of parallel flows while recognizing operational constraints and objectives.

  16. Fast combinatorial optimization with parallel digital computers.

    PubMed

    Kakeya, H; Okabe, Y

    2000-01-01

    This paper presents an algorithm which realizes fast search for the solutions of combinatorial optimization problems with parallel digital computers.With the standard weight matrices designed for combinatorial optimization, many iterations are required before convergence to a quasioptimal solution even when many digital processors can be used in parallel. By removing the components of the eingenvectors with eminent negative eigenvalues of the weight matrix, the proposed algorithm avoids oscillation and realizes energy reduction under synchronous discrete dynamics, which enables parallel digital computers to obtain quasi-optimal solutions with much less time than the conventional algorithm.

  17. Synchronization Of Parallel Discrete Event Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S.

    1992-01-01

    Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

  18. Heterogeneous parallel programming capability. Final report

    SciTech Connect

    Flower, J.W.; Kolawa, A.

    1990-11-30

    In creating a heterogeneous parallel processing capability we are really trying to approach three basic problems with current systems: (1) Supercomputer and parallel computer hardware architectures vary widely but need to support one or two fairly standard programming languages and programming models. A particularly important issue concerns the short life cycle of individual hardware designs; (2) Many algorithms require capabilities beyond the reach of single superconducters but could be approached by several machines working together; and (3) Performing a given task requires integration of a system that may contain many components in addition to the super or parallel computer itself. Peripherals from many different manufacturers must be incorporated.

  19. Poisson-Boltzmann theory for two parallel uniformly charged plates

    SciTech Connect

    Xing Xiangjun

    2011-04-15

    We solve the nonlinear Poisson-Boltzmann equation for two parallel and like-charged plates both inside a symmetric electrolyte, and inside a 2:1 asymmetric electrolyte, in terms of Weierstrass elliptic functions. From these solutions we derive the functional relation between the surface charge density, the plate separation, and the pressure between plates. For the one plate problem, we obtain exact expressions for the electrostatic potential and for the renormalized surface charge density, both in symmetric and in asymmetric electrolytes. For the two plate problems, we obtain new exact asymptotic results in various regimes.

  20. Parallel distance matrix computation for Matlab data mining

    NASA Astrophysics Data System (ADS)

    Skurowski, Przemysław; Staniszewski, Michał

    2016-06-01

    The paper presents utility functions for computing of a distance matrix, which plays a crucial role in data mining. The goal in the design was to enable operating on relatively large datasets by overcoming basic shortcoming - computing time - with an interface easy to use. The presented solution is a set of functions, which were created with emphasis on practical applicability in real life. The proposed solution is presented along the theoretical background for the performance scaling. Furthermore, different approaches of the parallel computing are analyzed, including shared memory, which is uncommon in Matlab environment.

  1. Parallel Processing of Broad-Band PPM Signals

    NASA Technical Reports Server (NTRS)

    Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement

    2010-01-01

    A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).

  2. Semirigorous synchronous sublattice algorithm for parallel kinetic Monte Carlo simulations of thin film growth

    NASA Astrophysics Data System (ADS)

    Shim, Yunsic; Amar, Jacques G.

    2005-03-01

    The standard kinetic Monte Carlo algorithm is an extremely efficient method to carry out serial simulations of dynamical processes such as thin film growth. However, in some cases it is necessary to study systems over extended time and length scales, and therefore a parallel algorithm is desired. Here we describe an efficient, semirigorous synchronous sublattice algorithm for parallel kinetic Monte Carlo simulations. The accuracy and parallel efficiency are studied as a function of diffusion rate, processor size, and number of processors for a variety of simple models of epitaxial growth. The effects of fluctuations on the parallel efficiency are also studied. Since only local communications are required, linear scaling behavior is observed, e.g., the parallel efficiency is independent of the number of processors for fixed processor size.

  3. The analysis of measurement accuracy of the parallel binocular stereo vision system

    NASA Astrophysics Data System (ADS)

    Yu, Huan; Xing, Tingwen; Jia, Xin

    2016-09-01

    Parallel binocular stereo vision system is a special form of binocular vision system. In order to simulate the human eyes observation state, the two cameras used to obtain images of the target scene are placed parallel to each other. This paper built a triangular geometric model, analyzed the structure parameters of parallel binocular stereo vision system and the correlations between them, and discussed the influences of baseline distance B between two cameras, the focal length f, the angle of view ω and other structural parameters on the accuracy of measurement. This paper used Matlab software to test the error function of parallel binocular stereo vision system under different structure parameters, and the simulation results showed the range of structure parameters when errors were small, thereby improved the accuracy of parallel binocular stereo vision system.

  4. Smart structures technologies for parallel kinematics in handling and assembly

    NASA Astrophysics Data System (ADS)

    Keimer, Ralf; Algermissen, Stephan; Pavlovic, Nenad; Budde, Christoph

    2007-04-01

    Parallel kinematics offer a high potential for increasing performance of machines for handling and assembly. Due to greater stiffness and reduced moving masses compared to typical serial kinematics, higher accelerations and thus lower cycle times can be achieved, which is an essential benchmark for high performance in handling and assembly. However, there are some challenges left to be able to fully exploit the potential of such machines. Some of these challenges are inherent to parallel kinematics, like a low ratio between work and installation space or a considerably changing structural elasticity as a function of the position in work space. Other difficulties arise from high accelerations, which lead to high dynamic loads inducing significant vibrations. While it is essential to cope with the challenges of parallel kinematics in the design-process, smart structures technologies lend themselves as means to face some of these challenges. In this paper a 4-degree of freedom parallel mechanism based on a triglide structure is presented. This machine was designed in a way to overcome the problem of low ratio between work and installation space, by allowing for a change of the structure's configuration with the purpose of increasing the work space. Furthermore, an active vibration suppression was designed and incorporated using rods with embedded piezoceramic actuators. The design of these smart structural parts is discussed and experimental results regarding the vibration suppression are shown. Adaptive joints are another smart structures technology, which can be used to increase the performance of parallel kinematics. The adaptiveness of such joints is reflected in their ability to change their friction attributes, whereas they can be used on one hand to suppress vibrations and on the other hand to change the degrees of freedom (DOF). The vibration suppression is achieved by increasing structural damping at the end of a trajectory and by maintaining low friction

  5. The Construction of Parallel Tests from IRT-Based Item Banks. Project Psychometric Aspects of Item Banking No. 43. Research Report 89-2.

    ERIC Educational Resources Information Center

    Boekkooi-Timminga, Ellen

    The construction of parallel tests from item response theory (IRT) based item banks is discussed. Tests are considered parallel whenever their information functions are identical. After the methods for constructing parallel tests are considered, the computational complexity of 0-1 linear programming and the heuristic procedure applied are…

  6. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  7. Data parallel sorting for particle simulation

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1992-01-01

    Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.

  8. Runtime support for parallelizing data mining algorithms

    NASA Astrophysics Data System (ADS)

    Jin, Ruoming; Agrawal, Gagan

    2002-03-01

    With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.

  9. Parallel processing of a rotating shaft simulation

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.

    1989-01-01

    A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

  10. NAS Parallel Benchmarks, Multi-Zone Versions

    NASA Technical Reports Server (NTRS)

    vanderWijngaart, Rob F.; Haopiang, Jin

    2003-01-01

    We describe an extension of the NAS Parallel Benchmarks (NPB) suite that involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy, which is common among structured-mesh production flow solver codes in use at NASA Ames and elsewhere, provides relatively easily exploitable coarse-grain parallelism between meshes. Since the individual application benchmarks also allow fine-grain parallelism themselves, this NPB extension, named NPB Multi-Zone (NPB-MZ), is a good candidate for testing hybrid and multi-level parallelization tools and strategies.

  11. Parallel programming with PCN. Revision 1

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  12. A Nomograph for Resistors in Parallel

    NASA Astrophysics Data System (ADS)

    Greenslade, Thomas B.

    2002-11-01

    The author has a large collection of 19th century textbooks, and found a quotation in one of them concerning a construction for finding the equivalent resistance of two resistors in parallel. This note discusses the equations.

  13. Parallel Implementation of the Discontinuous Galerkin Method

    NASA Technical Reports Server (NTRS)

    Baggag, Abdalkader; Atkins, Harold; Keyes, David

    1999-01-01

    This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.

  14. Feature Clustering for Accelerating Parallel Coordinate Descent

    SciTech Connect

    Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

    2012-12-06

    We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

  15. Parallel algorithms for dynamically partitioning unstructured grids

    SciTech Connect

    Diniz, P.; Plimpton, S.; Hendrickson, B.; Leland, R.

    1994-10-01

    Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describe three algorithms suitable for parallel dynamic load-balancing which attempt to partition unstructured grids so that computational load is balanced and communication is minimized. The execution time of algorithms and the quality of the partitions they generate are compared to results from serial partitioners for two large grids. The integration of the algorithms into a parallel particle simulation is also briefly discussed.

  16. The PISCES 2 parallel programming environment

    NASA Technical Reports Server (NTRS)

    Pratt, Terrence W.

    1987-01-01

    PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.

  17. Massively Parallel Computing: A Sandia Perspective

    SciTech Connect

    Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

    1999-05-06

    The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.

  18. Improved chopper circuit uses parallel transistors

    NASA Technical Reports Server (NTRS)

    1966-01-01

    Parallel transistor chopper circuit operates with one transistor in the forward mode and the other in the inverse mode. By using this method, it acts as a single, symmetrical, bidirectional transistor, and reduces and stabilizes the offset voltage.

  19. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

    PubMed

    Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

    2016-04-01

    Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .

  20. Parallelization of MATLAB for Euro50 integrated modeling

    NASA Astrophysics Data System (ADS)

    Browne, Michael; Andersen, Torben E.; Enmark, Anita; Moraru, Dan; Shearer, Andrew

    2004-09-01

    MATLAB and its companion product Simulink are commonly used tools in systems modelling and other scientific disciplines. A cross-disciplinary integrated MATLAB model is used to study the overall performance of the proposed 50m optical and infrared telescope, Euro50. However the computational requirements of this kind of end-to-end simulation of the telescope's behaviour, exceeds the capability of an individual contemporary Personal Computer. By parallelizing the model, primarily on a functional basis, it can be implemented across a Beowulf cluster of generic PCs. This requires MATLAB to distribute in some way data and calculations to the cluster nodes and combine completed results. There have been a number of attempts to produce toolkits to allow MATLAB to be used in a parallel fashion. They have used a variety of techniques. Here we present findings from using some of these toolkits and proposed advances.