Sample records for chain running parallel

  1. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  2. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.

    PubMed

    Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel

    2012-09-25

    Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.

  3. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics

    PubMed Central

    2012-01-01

    Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363

  4. 1H-Indole-3-carbaldehyde.

    PubMed

    Dileep, C S; Abdoh, M M M; Chakravarthy, M P; Mohana, K N; Sridhar, M A

    2012-11-01

    In the title compound, C(9)H(7)NO, the benzene ring forms a dihedral angle of 3.98 (12)° with the pyrrole ring. In the crystal, N-H⋯O hydrogen bonds links the mol-ecules into chains which run parallel to [02-1].

  5. 1H-Indole-3-carbaldehyde

    PubMed Central

    Dileep, C. S.; Abdoh, M. M. M.; Chakravarthy, M. P.; Mohana, K. N.; Sridhar, M. A.

    2012-01-01

    In the title compound, C9H7NO, the benzene ring forms a dihedral angle of 3.98 (12)° with the pyrrole ring. In the crystal, N–H⋯O hydrogen bonds links the mol­ecules into chains which run parallel to [02-1]. PMID:23284457

  6. The Effects of Treadmill Running on Aging Laryngeal Muscle Structure

    PubMed Central

    Kletzien, Heidi; Russell, John A.; Connor, Nadine P.

    2015-01-01

    Levels of Evidence NA (animal study) Objective Age-related changes in laryngeal muscle structure and function may contribute to deficits in voice and swallowing observed in elderly people. We hypothesized that treadmill running, an exercise that increases respiratory drive to upper airway muscles, would induce changes in thyroarytenoid muscle myosin heavy chain (MHC) isoforms consistent with a fast-slow transformation in muscle fiber type. Study Design Randomized parallel group controlled trial. Methods Fifteen young adult and 14 old Fischer 344/Brown Norway rats received either treadmill running or no exercise (5 days/week/8 weeks). Myosin heavy chain isoform composition in the thyroarytenoid muscle was examined at the end of 8 weeks. Results Significant age and treatment effects were found. The young adult group had the greatest proportion of superfast contracting MHCIIL. The treadmill running group had the lowest proportion of MHCIIL and the greatest proportion of MHCIIx. Conclusion Thyroarytenoid muscle structure was affected both by age and treadmill running in a fast-slow transition that is characteristic of exercise manipulations in other skeletal muscles. PMID:26256100

  7. Running accuracy analysis of a 3-RRR parallel kinematic machine considering the deformations of the links

    NASA Astrophysics Data System (ADS)

    Wang, Liping; Jiang, Yao; Li, Tiemin

    2014-09-01

    Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.

  8. Li0.5Al0.5Mg2(MoO4)3

    PubMed Central

    Ennajeh, Ines; Zid, Mohamed Faouzi; Driss, Ahmed

    2013-01-01

    The title compound, lithium/aluminium dimagnesium tetra­kis­[orthomolybdate(VI)], was prepared by a solid-state reaction route. The crystal structure is built up from MgO6 octa­hedra and MoO4 tetra­hedra sharing corners and edges, forming two types of chains running along [100]. These chains are linked into layers parallel to (010) and finally linked by MoO4 tetra­hedra into a three-dimensional framework structure with channels parallel to [001] in which lithium and aluminium cations equally occupy the same position within a distorted trigonal–bipyramidal coordination environment. The title structure is isotypic with LiMgIn(MoO4)3, with the In site becoming an Mg site and the fully occupied Li site a statistically occupied Li/Al site in the title structure. PMID:24426975

  9. Bayesian tomography by interacting Markov chains

    NASA Astrophysics Data System (ADS)

    Romary, T.

    2017-12-01

    In seismic tomography, we seek to determine the velocity of the undergound from noisy first arrival travel time observations. In most situations, this is an ill posed inverse problem that admits several unperfect solutions. Given an a priori distribution over the parameters of the velocity model, the Bayesian formulation allows to state this problem as a probabilistic one, with a solution under the form of a posterior distribution. The posterior distribution is generally high dimensional and may exhibit multimodality. Moreover, as it is known only up to a constant, the only sensible way to addressthis problem is to try to generate simulations from the posterior. The natural tools to perform these simulations are Monte Carlo Markov chains (MCMC). Classical implementations of MCMC algorithms generally suffer from slow mixing: the generated states are slow to enter the stationary regime, that is to fit the observations, and when one mode of the posterior is eventually identified, it may become difficult to visit others. Using a varying temperature parameter relaxing the constraint on the data may help to enter the stationary regime. Besides, the sequential nature of MCMC makes them ill fitted toparallel implementation. Running a large number of chains in parallel may be suboptimal as the information gathered by each chain is not mutualized. Parallel tempering (PT) can be seen as a first attempt to make parallel chains at different temperatures communicate but only exchange information between current states. In this talk, I will show that PT actually belongs to a general class of interacting Markov chains algorithm. I will also show that this class enables to design interacting schemes that can take advantage of the whole history of the chain, by authorizing exchanges toward already visited states. The algorithms will be illustrated with toy examples and an application to first arrival traveltime tomography.

  10. A Component-Based Extension Framework for Large-Scale Parallel Simulations in NEURON

    PubMed Central

    King, James G.; Hines, Michael; Hill, Sean; Goodman, Philip H.; Markram, Henry; Schürmann, Felix

    2008-01-01

    As neuronal simulations approach larger scales with increasing levels of detail, the neurosimulator software represents only a part of a chain of tools ranging from setup, simulation, interaction with virtual environments to analysis and visualizations. Previously published approaches to abstracting simulator engines have not received wide-spread acceptance, which in part may be to the fact that they tried to address the challenge of solving the model specification problem. Here, we present an approach that uses a neurosimulator, in this case NEURON, to describe and instantiate the network model in the simulator's native model language but then replaces the main integration loop with its own. Existing parallel network models are easily adopted to run in the presented framework. The presented approach is thus an extension to NEURON but uses a component-based architecture to allow for replaceable spike exchange components and pluggable components for monitoring, analysis, or control that can run in this framework alongside with the simulation. PMID:19430597

  11. Frequentist and Bayesian Orbital Parameter Estimaton from Radial Velocity Data Using RVLIN, BOOTTRAN, and RUN DMC

    NASA Astrophysics Data System (ADS)

    Nelson, Benjamin Earl; Wright, Jason Thomas; Wang, Sharon

    2015-08-01

    For this hack session, we will present three tools used in analyses of radial velocity exoplanet systems. RVLIN is a set of IDL routines used to quickly fit an arbitrary number of Keplerian curves to radial velocity data to find adequate parameter point estimates. BOOTTRAN is an IDL-based extension of RVLIN to provide orbital parameter uncertainties using bootstrap based on a Keplerian model. RUN DMC is a highly parallelized Markov chain Monte Carlo algorithm that employs an n-body model, primarily used for dynamically complex or poorly constrained exoplanet systems. We will compare the performance of these tools and their applications to various exoplanet systems.

  12. 2-Ferrocenyl-3-meth­oxy-6-methyl­pyridine

    PubMed Central

    Xu, Chen; Hao, Xin-Qi; Liu, Fang; Wu, Xiu-Juan; Song, Mao-Ping

    2009-01-01

    In the title compound, [Fe(C5H5)(C12H12NO)], the dihedral angle between the pyridyl and substituted cyclo­penta­dienyl rings is 23.58 (3)°. The crystal structure is characterized by weak inter­molecular C—H⋯N hydrogen-bonding contacts, leading to the formation of chains running parallel to the n-glide planes. A weak inter­molecular C—H⋯π contact is also present. PMID:21583761

  13. Two drimane lactones, valdiviolide and 11-epivaldiviolide, in the form of a 1:1 cocrystal obtained from Drimys winteri extracts.

    PubMed

    Paz Robles, Cristian; Mercado, Darío; Suarez, Sebastián; Baggio, Ricardo

    2014-12-01

    A cocrystal, C15H22O3·C15H22O3, (I), obtained from Drimys winteri, is composed of two isomeric drimane sesquiterpene lactones, namely valdiviolide, (Ia), and 11-epivaldiviolide, (Ib), neither of which has been reported in the crystal form. Both diastereoisomers present three chiral centres at sites 5, 10 and 11, with an SSR sequence in (Ia) and an SSS sequence in (Ib). O-H···O hydrogen bonds bind molecules into chains running along [120] and the chains are in turn linked by π-π stacking interactions to define planar weakly interacting arrays parallel to (001).

  14. Crystal structure of poly[{μ-N,N′-bis[(pyridin-4-yl)meth­yl]oxalamide}-μ-oxalato-cobalt(II)

    PubMed Central

    Zou, Hengye; Qi, Yanjuan

    2014-01-01

    In the polymeric title compound, [Co(C2O4)(C14H14N4O2)]n, the CoII atom is six-coordinated by two N atoms from symmetry-related bis­[(pyridin-4-yl)meth­yl]oxalamide (BPMO) ligands and four O atoms from two centrosymmetric oxalate anions in a distorted octa­hedral coordination geometry. The CoII atoms are linked by the oxalate anions into a chain running parallel to [100]. The chains are linked by the BPMO ligands into a three-dimensional architecture. In addition, N—H⋯O hydrogen bonds stabilize the crystal packing. PMID:25309173

  15. Crystal structure of (2R*,3aR*)-2-phenyl-sulfonyl-2,3,3a,4,5,6-hexa-hydro-pyrrolo-[1,2-b]isoxazole.

    PubMed

    Hernández, Yaiza; Marcos, Isidro; Garrido, Narciso M; Sanz, Francisca; Diez, David

    2017-01-01

    The title compound, C 12 H 15 NO 3 S, was prepared by 1,3-dipolar cyclo-addition of 3,4-di-hydro-2 H -pyrrole 1-oxide and phenyl vinyl sulfone. In the mol-ecule, both fused five-membered rings display a twisted conformation. In the crystal, C-H⋯O hydrogen bonds link neighbouring mol-ecules, forming chains running parallel to the b axis.

  16. Data assimilation using a GPU accelerated path integral Monte Carlo approach

    NASA Astrophysics Data System (ADS)

    Quinn, John C.; Abarbanel, Henry D. I.

    2011-09-01

    The answers to data assimilation questions can be expressed as path integrals over all possible state and parameter histories. We show how these path integrals can be evaluated numerically using a Markov Chain Monte Carlo method designed to run in parallel on a graphics processing unit (GPU). We demonstrate the application of the method to an example with a transmembrane voltage time series of a simulated neuron as an input, and using a Hodgkin-Huxley neuron model. By taking advantage of GPU computing, we gain a parallel speedup factor of up to about 300, compared to an equivalent serial computation on a CPU, with performance increasing as the length of the observation time used for data assimilation increases.

  17. Improving the scalability of hyperspectral imaging applications on heterogeneous platforms using adaptive run-time data compression

    NASA Astrophysics Data System (ADS)

    Plaza, Antonio; Plaza, Javier; Paz, Abel

    2010-10-01

    Latest generation remote sensing instruments (called hyperspectral imagers) are now able to generate hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. In previous work, we have reported that the scalability of parallel processing algorithms dealing with these high-dimensional data volumes is affected by the amount of data to be exchanged through the communication network of the system. However, large messages are common in hyperspectral imaging applications since processing algorithms are pixel-based, and each pixel vector to be exchanged through the communication network is made up of hundreds of spectral values. Thus, decreasing the amount of data to be exchanged could improve the scalability and parallel performance. In this paper, we propose a new framework based on intelligent utilization of wavelet-based data compression techniques for improving the scalability of a standard hyperspectral image processing chain on heterogeneous networks of workstations. This type of parallel platform is quickly becoming a standard in hyperspectral image processing due to the distributed nature of collected hyperspectral data as well as its flexibility and low cost. Our experimental results indicate that adaptive lossy compression can lead to improvements in the scalability of the hyperspectral processing chain without sacrificing analysis accuracy, even at sub-pixel precision levels.

  18. Patterns of anterior and posterior muscle chain interactions during high performance long-hang elements in gymnastics.

    PubMed

    von Laßberg, Christoph; Rapp, Walter; Krug, Jürgen

    2014-06-01

    In a prior study with high level gymnasts we could demonstrate that the neuromuscular activation pattern during the "whip-like" leg acceleration phases (LAP) in accelerating movement sequences on high bar, primarily runs in a consecutive succession from the bar (punctum fixum) to the legs (punctum mobile). The current study presents how the neuromuscular activation is represented during movement sequences that immediately follow the LAP by the antagonist muscle chain to generate an effective transfer of momentum for performing specific elements, based on the energy generated by the preceding LAP. Thirteen high level gymnasts were assessed by surface electromyography during high performance elements on high bar and parallel bars. The results show that the neuromuscular succession runs primarily from punctum mobile towards punctum fixum for generating the transfer of momentum. Additionally, further principles of neuromuscular interactions between the anterior and posterior muscle chain during such movement sequences are presented. The findings complement the understanding of neuromuscular activation patterns during rotational movements around fixed axes and will help to form the basis of more direct and better teaching methods regarding earlier optimization and facilitation of the motor learning process concerning fundamental movement requirements. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  20. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

    DOE PAGES

    Böhme, David; Geimer, Markus; Arnold, Lukas; ...

    2016-07-20

    Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present amore » scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.« less

  1. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Böhme, David; Geimer, Markus; Arnold, Lukas

    Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present amore » scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.« less

  2. Crystal structures of isomeric 3,5-di-chloro-N-(2,3-di-methyl-phen-yl)benzene-sulfonamide, 3,5-di-chloro-N-(2,6-di-methyl-phen-yl)benzene-sulfonamide and 3,5-di-chloro-N-(3,5-di-methyl-phen-yl)benzene-sulfonamide.

    PubMed

    Shakuntala, K; Naveen, S; Lokanath, N K; Suchetan, P A

    2017-05-01

    The crystal structures of three isomeric compounds of formula C 14 H 13 Cl 2 NO 2 S, namely 3,5-di-chloro- N -(2,3-di-methyl-phen-yl)-benzene-sulfonamide (I), 3,5-di-chloro- N -(2,6-di-methyl-phen-yl)benzene-sulfonamide (II) and 3,5-di-chloro- N -(3,5-di-methyl-phen-yl)benzene-sulfonamide (III) are described. The mol-ecules of all the three compounds are U-shaped with the two aromatic rings inclined at 41.3 (6)° in (I), 42.1 (2)° in (II) and 54.4 (3)° in (III). The mol-ecular conformation of (II) is stabilized by intra-molecular C-H⋯O hydrogen bonds and C-H⋯π inter-actions. The crystal structure of (I) features N-H⋯O hydrogen-bonded R 2 2 (8) loops inter-connected via C (7) chains of C-H⋯O inter-actions, forming a three-dimensional architecture. The structure also features π-π inter-actions [ Cg ⋯ Cg = 3.6970 (14) Å]. In (II), N-H⋯O hydrogen-bonded R 2 2 (8) loops are inter-connected via π-π inter-actions [inter-centroid distance = 3.606 (3) Å] to form a one-dimensional architecture running parallel to the a axis. In (III), adjacent C (4) chains of N-H⋯O hydrogen-bonded mol-ecules running parallel to [010] are connected via C-H⋯π inter-actions, forming sheets parallel to the ab plane. Neighbouring sheets are linked via offset π-π inter-actions [inter-centroid distance = 3.8303 (16) Å] to form a three-dimensional architecture.

  3. PELE web server: atomistic study of biomolecular systems at your fingertips.

    PubMed

    Madadkar-Sobhani, Armin; Guallar, Victor

    2013-07-01

    PELE, Protein Energy Landscape Exploration, our novel technology based on protein structure prediction algorithms and a Monte Carlo sampling, is capable of modelling the all-atom protein-ligand dynamical interactions in an efficient and fast manner, with two orders of magnitude reduced computational cost when compared with traditional molecular dynamics techniques. PELE's heuristic approach generates trial moves based on protein and ligand perturbations followed by side chain sampling and global/local minimization. The collection of accepted steps forms a stochastic trajectory. Furthermore, several processors may be run in parallel towards a collective goal or defining several independent trajectories; the whole procedure has been parallelized using the Message Passing Interface. Here, we introduce the PELE web server, designed to make the whole process of running simulations easier and more practical by minimizing input file demand, providing user-friendly interface and producing abstract outputs (e.g. interactive graphs and tables). The web server has been implemented in C++ using Wt (http://www.webtoolkit.eu) and MySQL (http://www.mysql.com). The PELE web server, accessible at http://pele.bsc.es, is free and open to all users with no login requirement.

  4. Appraisal of jump distributions in ensemble-based sampling algorithms

    NASA Astrophysics Data System (ADS)

    Dejanic, Sanda; Scheidegger, Andreas; Rieckermann, Jörg; Albert, Carlo

    2017-04-01

    Sampling Bayesian posteriors of model parameters is often required for making model-based probabilistic predictions. For complex environmental models, standard Monte Carlo Markov Chain (MCMC) methods are often infeasible because they require too many sequential model runs. Therefore, we focused on ensemble methods that use many Markov chains in parallel, since they can be run on modern cluster architectures. Little is known about how to choose the best performing sampler, for a given application. A poor choice can lead to an inappropriate representation of posterior knowledge. We assessed two different jump moves, the stretch and the differential evolution move, underlying, respectively, the software packages EMCEE and DREAM, which are popular in different scientific communities. For the assessment, we used analytical posteriors with features as they often occur in real posteriors, namely high dimensionality, strong non-linear correlations or multimodality. For posteriors with non-linear features, standard convergence diagnostics based on sample means can be insufficient. Therefore, we resorted to an entropy-based convergence measure. We assessed the samplers by means of their convergence speed, robustness and effective sample sizes. For posteriors with strongly non-linear features, we found that the stretch move outperforms the differential evolution move, w.r.t. all three aspects.

  5. Metal-organic framework assembled from erbium and a tetrapodal polyphosphonic acid organic linker.

    PubMed

    Mendes, Ricardo F; Firmino, Ana D G; Tomé, João P C; Almeida Paz, Filipe A

    2018-06-01

    A three-dimensional metal-organic framework (MOF), poly[[μ 6 -5'-pentahydrogen [1,1'-biphenyl]-3,3',5,5'-tetrayltetrakis(phosphonato)]erbium(III)] 2.5-hydrate], formulated as [Er(C 12 H 11 O 12 P 4 )]·2.5H 2 O or [Er(H 5 btp)]·2.5H 2 O (I) and isotypical with a Y 3+ -based MOF reported previously by our research group [Firmino et al. (2017b). Inorg. Chem. 56, 1193-1208], was constructed based solely on Er 3+ and on the polyphosphonic organic linker [1,1'-biphenyl]-3,3',5,5'-tetrakis(phosphonic acid) (H 8 btp). The present work describes our efforts to introduce lanthanide cations into the flexible network, demonstrating that, on the one hand, the compound can be obtained using three distinct experimental methods, i.e. hydro(solvo)thermal (Hy), microwave-assisted (MW) and one-pot (Op), and, on the other hand, that crystallite size can be approximately fine-tuned according to the method employed. MOF I contains hexacoordinated Er 3+ cations which are distributed in a zigzag inorganic chain running parallel to the [100] direction of the unit cell. The chains are, in turn, bridged by the anionic organic linker to form a three-dimensional 6,6-connected binodal network. This connectivity leads to the existence of one-dimensional channels (also running parallel to the [100] direction) filled with disordered and partially occupied water molecules of crystalization which are engaged in O-H...O hydrogen-bonding interactions with the [Er(H 5 btp)] framework. Additional weak π-π interactions [intercentroid distance = 3.957 (7) Å] exist between aromatic rings, which help to maintain the structural integrity of the network.

  6. ATLAS Tile calorimeter calibration and monitoring systems

    NASA Astrophysics Data System (ADS)

    Chomont, Arthur; ATLAS Collaboration

    2017-11-01

    The ATLAS Tile Calorimeter (TileCal) is the central section of the hadronic calorimeter of the ATLAS experiment and provides important information for reconstruction of hadrons, jets, hadronic decays of tau leptons and missing transverse energy. This sampling calorimeter uses steel plates as absorber and scintillating tiles as active medium. The light produced by the passage of charged particles is transmitted by wavelength shifting fibres to photomultiplier tubes (PMTs), located on the outside of the calorimeter. The readout is segmented into about 5000 cells (longitudinally and transversally), each of them being read out by two PMTs in parallel. To calibrate and monitor the stability and performance of each part of the readout chain during the data taking, a set of calibration systems is used. The TileCal calibration system comprises cesium radioactive sources, Laser and charge injection elements, and allows for monitoring and equalization of the calorimeter response at each stage of the signal production, from scintillation light to digitization. Based on LHC Run 1 experience, several calibration systems were improved for Run 2. The lessons learned, the modifications, and the current LHC Run 2 performance are discussed.

  7. SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeff S.

    1992-01-01

    Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

  8. De novo modeling of the F420-reducing [NiFe]-hydrogenase from a methanogenic archaeon by cryo-electron microscopy

    PubMed Central

    Mills, Deryck J; Vitt, Stella; Strauss, Mike; Shima, Seigo; Vonck, Janet

    2013-01-01

    Methanogenic archaea use a [NiFe]-hydrogenase, Frh, for oxidation/reduction of F420, an important hydride carrier in the methanogenesis pathway from H2 and CO2. Frh accounts for about 1% of the cytoplasmic protein and forms a huge complex consisting of FrhABG heterotrimers with each a [NiFe] center, four Fe-S clusters and an FAD. Here, we report the structure determined by near-atomic resolution cryo-EM of Frh with and without bound substrate F420. The polypeptide chains of FrhB, for which there was no homolog, was traced de novo from the EM map. The 1.2-MDa complex contains 12 copies of the heterotrimer, which unexpectedly form a spherical protein shell with a hollow core. The cryo-EM map reveals strong electron density of the chains of metal clusters running parallel to the protein shell, and the F420-binding site is located at the end of the chain near the outside of the spherical structure. DOI: http://dx.doi.org/10.7554/eLife.00218.001 PMID:23483797

  9. A computer program for uncertainty analysis integrating regression and Bayesian methods

    USGS Publications Warehouse

    Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary

    2014-01-01

    This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.

  10. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  11. Implementation and performance of parallel Prolog interpreter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, S.; Kale, L.V.; Balkrishna, R.

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  12. SCoPE: an efficient method of Cosmological Parameter Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Das, Santanu; Souradeep, Tarun, E-mail: santanud@iucaa.ernet.in, E-mail: tarun@iucaa.ernet.in

    Markov Chain Monte Carlo (MCMC) sampler is widely used for cosmological parameter estimation from CMB and other data. However, due to the intrinsic serial nature of the MCMC sampler, convergence is often very slow. Here we present a fast and independently written Monte Carlo method for cosmological parameter estimation named as Slick Cosmological Parameter Estimator (SCoPE), that employs delayed rejection to increase the acceptance rate of a chain, and pre-fetching that helps an individual chain to run on parallel CPUs. An inter-chain covariance update is also incorporated to prevent clustering of the chains allowing faster and better mixing of themore » chains. We use an adaptive method for covariance calculation to calculate and update the covariance automatically as the chains progress. Our analysis shows that the acceptance probability of each step in SCoPE is more than 95% and the convergence of the chains are faster. Using SCoPE, we carry out some cosmological parameter estimations with different cosmological models using WMAP-9 and Planck results. One of the current research interests in cosmology is quantifying the nature of dark energy. We analyze the cosmological parameters from two illustrative commonly used parameterisations of dark energy models. We also asses primordial helium fraction in the universe can be constrained by the present CMB data from WMAP-9 and Planck. The results from our MCMC analysis on the one hand helps us to understand the workability of the SCoPE better, on the other hand it provides a completely independent estimation of cosmological parameters from WMAP-9 and Planck data.« less

  13. Exploring first-order phase transitions with population annealing

    NASA Astrophysics Data System (ADS)

    Barash, Lev Yu.; Weigel, Martin; Shchur, Lev N.; Janke, Wolfhard

    2017-03-01

    Population annealing is a hybrid of sequential and Markov chain Monte Carlo methods geared towards the efficient parallel simulation of systems with complex free-energy landscapes. Systems with first-order phase transitions are among the problems in computational physics that are difficult to tackle with standard methods such as local-update simulations in the canonical ensemble, for example with the Metropolis algorithm. It is hence interesting to see whether such transitions can be more easily studied using population annealing. We report here our preliminary observations from population annealing runs for the two-dimensional Potts model with q > 4, where it undergoes a first-order transition.

  14. Crystal structure at 2.8 A of the DLLRKN-containing coiled-coil domain of huntingtin-interacting protein 1 (HIP1) reveals a surface suitable for clathrin light chain binding.

    PubMed

    Ybe, Joel A; Mishra, Sanjay; Helms, Stephen; Nix, Jay

    2007-03-16

    Huntingtin interacting protein 1 (HIP1) is a member of a family of proteins whose interaction with Huntingtin is critical to prevent cells from initiating apoptosis. HIP1, and related protein HIP12/1R, can also bind to clathrin and membrane phospholipids, and HIP12/1R links the CCV to the actin cytoskeleton. HIP1 and HIP12/1R interact with the clathrin light chain EED regulatory site and stimulate clathrin lattice assembly. Here, we report the X-ray structure of the coiled-coil domain of HIP1 (residues 482-586) that includes residues crucial for binding clathrin light chain. The dimeric HIP1 crystal structure is partially splayed open. The comparison of the HIP1 model with coiled-coil predictions revealed the heptad repeat in the dimeric trunk (S2 path) is offset relative to the register of the heptad repeat from the N-terminal portion (S1 path) of the molecule. Furthermore, surface analysis showed there is a third hydrophobic path (S3) running parallel with S1 and S2. We present structural evidence supporting a role for the S3 path as an interaction surface for clathrin light chain. Finally, comparative analysis suggests the mode of binding between sla2p and clathrin light chain may be different in yeast.

  15. MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

    NASA Astrophysics Data System (ADS)

    Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

    2018-02-01

    We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

  16. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  17. Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.

    PubMed

    Jin, Ick Hoon; Yuan, Ying; Liang, Faming

    2013-10-01

    Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.

  18. Run-time parallelization and scheduling of loops

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

    1991-01-01

    Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.

  19. Why not make a PC cluster of your own? 5. AppleSeed: A Parallel Macintosh Cluster for Scientific Computing

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor K.; Dauger, Dean E.

    We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.

  20. View looking SW at brick retaining wall running parallel to ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    View looking SW at brick retaining wall running parallel to Jones Street showing bricked up storage vaults - Central of Georgia Railway, Savannah Repair Shops & Terminal Facilities, Brick Storage Vaults under Jones Street, Bounded by West Broad, Jones, West Boundary & Hull Streets, Savannah, Chatham County, GA

  1. How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

    NASA Astrophysics Data System (ADS)

    Decyk, V. K.; Dauger, D. E.

    We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.

  2. Support for Online Calibration in the ALICE HLT Framework

    NASA Astrophysics Data System (ADS)

    Krzewicki, Mikolaj; Rohr, David; Zampolli, Chiara; Wiechula, Jens; Gorbunov, Sergey; Chauvin, Alex; Vorobyev, Ivan; Weber, Steffen; Schweda, Kai; Shahoyan, Ruben; Lindenstruth, Volker; ALICE Collaboration

    2017-10-01

    The ALICE detector employs sub detectors sensitive to environmental conditions such as pressure and temperature, e.g. the time projection chamber (TPC). A precise reconstruction of particle trajectories requires precise calibration of these detectors. Performing the calibration in real time in the HLT improves the online reconstruction and potentially renders certain offline calibration steps obsolete, speeding up offline physics analysis. For LHC Run 3, starting in 2020 when data reduction will rely on reconstructed data, online calibration becomes a necessity. In order to run the calibration online, the HLT now supports the processing of tasks that typically run offline. These tasks run massively in parallel on all HLT compute nodes and their output is gathered and merged periodically. The calibration results are both stored offline for later use and fed back into the HLT chain via a feedback loop in order to apply calibration information to the online track reconstruction. Online calibration and feedback loop are subject to certain time constraints in order to provide up-to-date calibration information and they must not interfere with ALICE data taking. Our approach to run these tasks in asynchronous processes enables us to separate them from normal data taking in a way that makes it failure resilient. We performed a first test of online TPC drift time calibration under real conditions during the heavy-ion run in December 2015. We present an analysis and conclusions of this first test, new improvements and developments based on this, as well as our current scheme to commission this for production use.

  3. Improved packing of protein side chains with parallel ant colonies.

    PubMed

    Quan, Lijun; Lü, Qiang; Li, Haiou; Xia, Xiaoyan; Wu, Hongjie

    2014-01-01

    The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.

  4. Creating a Parallel Version of VisIt for Microsoft Windows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitlock, B J; Biagas, K S; Rawson, P L

    2011-12-07

    VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less

  5. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed Central

    Nadkarni, P. M.; Miller, P. L.

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632

  6. Partitioning problems in parallel, pipelined and distributed computing

    NASA Technical Reports Server (NTRS)

    Bokhari, S.

    1985-01-01

    The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.

  7. Lower limb joint angles and ground reaction forces in forefoot strike and rearfoot strike runners during overground downhill and uphill running.

    PubMed

    Kowalski, Erik; Li, Jing Xian

    2016-11-01

    This study investigated the normal and parallel ground reaction forces during downhill and uphill running in habitual forefoot strike and habitual rearfoot strike (RFS) runners. Fifteen habitual forefoot strike and 15 habitual RFS recreational male runners ran at 3 m/s ± 5% during level, uphill and downhill overground running on a ramp mounted at 6° and 9°. Results showed that forefoot strike runners had no visible impact peak in all running conditions, while the impact peaks only decreased during the uphill conditions in RFS runners. Active peaks decreased during the downhill conditions in forefoot strike runners while active loading rates increased during downhill conditions in RFS runners. Compared to the level condition, parallel braking peaks were larger during downhill conditions and parallel propulsive peaks were larger during uphill conditions. Combined with previous biomechanics studies, our findings suggest that forefoot strike running may be an effective strategy to reduce impacts, especially during downhill running. These findings may have further implications towards injury management and prevention.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aoki, Kenji

    A read/write head for a magnetic tape includes an elongated chip assembly and a tape running surface formed in the longitudinal direction of the chip assembly. A pair of substantially spaced parallel read/write gap lines for supporting read/write elements extend longitudinally along the tape running surface of the chip assembly. Also, at least one groove is formed on the tape running surface on both sides of each of the read/write gap lines and extends substantially parallel to the read/write gap lines.

  9. The Nature of Bonding in Bulk Tellurium Composed of One-Dimensional Helical Chains.

    PubMed

    Yi, Seho; Zhu, Zhili; Cai, Xiaolin; Jia, Yu; Cho, Jun-Hyung

    2018-05-07

    Bulk tellurium (Te) is composed of one-dimensional (1D) helical chains which have been considered to be coupled by van der Waals (vdW) interactions. However, on the basis of first-principles density functional theory calculations, we here propose a different bonding nature between neighboring chains: i.e., helical chains made of normal covalent bonds are connected together by coordinate covalent bonds. It is revealed that the lone pairs of electrons of Te atoms participate in forming coordinate covalent bonds between neighboring chains, where each Te atom behaves as both an electron donor to neighboring chains and an electron acceptor from neighboring chains. This ligand-metal-like bonding nature in bulk Te results in the same order of bulk moduli along the directions parallel and perpendicular to the chains, contrasting with the large anisotropy of bulk moduli in vdW crystals. We further find that the electron effective masses parallel and perpendicular to the chains are almost the same as each other, consistent with the observed nearly isotropic electrical resistivity. It is thus demonstrated that the normal/coordinate covalent bonds parallel/perpendicular to the chains in bulk Te lead to a minor anisotropy in structural and transport properties.

  10. Crystal structure at 2.8 Å of the DLLRKN-containing coiled-coil domain of Huntingtin-interacting protein 1 (HIP1) reveals a surface suitable for clathrin light chain binding

    PubMed Central

    Ybe, Joel A.; Mishra, Sanjay; Helms, Stephen; Nix, Jay

    2007-01-01

    Summary Huntingtin interacting protein 1 (HIP1) is a member of a family of proteins whose interaction with Huntingtin is critical to prevent cells from initiating apoptosis. HIP1, and related protein HIP12/1R, can also bind to clathrin and membrane phospholipids and HIP12/1R links the CCV to the actin cytoskeleton. HIP1 and HIP12/1R interact with the clathrin light chain EED regulatory site and stimulate clathrin lattice assembly. Here we report the X-ray structure of the coiled-coil domain of HIP1 from 482–586 that includes residues crucial for binding clathrin light chain. The dimeric HIP1 crystal structure is partially splayed open. The comparison of the HIP1 model with coiled-coil predictions revealed the heptad repeat in the dimeric trunk (S2 path) is offset relative to the register of the heptad repeat from the N-terminal portion (S1 path) of the molecule. Furthermore, surface analysis showed there is a third hydrophobic path (S3) running parallel to S1 and S2. We present structural evidence supporting a role for S3 path as an interaction surface for clathrin light chain. Finally, comparative analysis suggests the mode of binding between sla2p and clathrin light chain may be different in yeast. PMID:17257618

  11. Run-time parallelization and scheduling of loops

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

    1990-01-01

    Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.

  12. The Construction and Validation of All-Atom Bulk-Phase Models of Amorphous Polymers Using the TIGER2/TIGER3 Empirical Sampling Method

    PubMed Central

    Li, Xianfeng; Murthy, Sanjeeva; Latour, Robert A.

    2011-01-01

    A new empirical sampling method termed “temperature intervals with global exchange of replicas and reduced radii” (TIGER3) is presented and demonstrated to efficiently equilibrate entangled long-chain molecular systems such as amorphous polymers. The TIGER3 algorithm is a replica exchange method in which simulations are run in parallel over a range of temperature levels at and above a designated baseline temperature. The replicas sampled at temperature levels above the baseline are run through a series of cycles with each cycle containing four stages – heating, sampling, quenching, and temperature level reassignment. The method allows chain segments to pass through one another at elevated temperature levels during the sampling stage by reducing the van der Waals radii of the atoms, thus eliminating chain entanglement problems. Atomic radii are then returned to their regular values and re-equilibrated at elevated temperature prior to quenching to the baseline temperature. Following quenching, replicas are compared using a Metropolis Monte Carlo exchange process for the construction of an approximate Boltzmann-weighted ensemble of states and then reassigned to the elevated temperature levels for additional sampling. Further system equilibration is performed by periodic implementation of the previously developed TIGER2 algorithm between cycles of TIGER3, which applies thermal cycling without radii reduction. When coupled with a coarse-grained modeling approach, the combined TIGER2/TIGER3 algorithm yields fast equilibration of bulk-phase models of amorphous polymer, even for polymers with complex, highly branched structures. The developed method was tested by modeling the polyethylene melt. The calculated properties of chain conformation and chain segment packing agreed well with published data. The method was also applied to generate equilibrated structural models of three increasingly complex amorphous polymer systems: poly(methyl methacrylate), poly(butyl methacrylate), and DTB-succinate copolymer. Calculated glass transition temperature (Tg) and structural parameter profile (S(q)) for each resulting polymer model were found to be in close agreement with experimental Tg values and structural measurements obtained by x-ray diffraction, thus validating that the developed methods provide realistic models of amorphous polymer structure. PMID:21769156

  13. A Concurrent Implementation of the Cascade-Correlation Algorithm, Using the Time Warp Operating System

    NASA Technical Reports Server (NTRS)

    Springer, P.

    1993-01-01

    This paper discusses the method in which the Cascade-Correlation algorithm was parallelized in such a way that it could be run using the Time Warp Operating System (TWOS). TWOS is a special purpose operating system designed to run parellel discrete event simulations with maximum efficiency on parallel or distributed computers.

  14. Improved packing of protein side chains with parallel ant colonies

    PubMed Central

    2014-01-01

    Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms. PMID:25474164

  15. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less

  16. 3D printed soft parallel actuator

    NASA Astrophysics Data System (ADS)

    Zolfagharian, Ali; Kouzani, Abbas Z.; Khoo, Sui Yang; Noshadi, Amin; Kaynak, Akif

    2018-04-01

    This paper presents a 3-dimensional (3D) printed soft parallel contactless actuator for the first time. The actuator involves an electro-responsive parallel mechanism made of two segments namely active chain and passive chain both 3D printed. The active chain is attached to the ground from one end and constitutes two actuator links made of responsive hydrogel. The passive chain, on the other hand, is attached to the active chain from one end and consists of two rigid links made of polymer. The actuator links are printed using an extrusion-based 3D-Bioplotter with polyelectrolyte hydrogel as printer ink. The rigid links are also printed by a 3D fused deposition modelling (FDM) printer with acrylonitrile butadiene styrene (ABS) as print material. The kinematics model of the soft parallel actuator is derived via transformation matrices notations to simulate and determine the workspace of the actuator. The printed soft parallel actuator is then immersed into NaOH solution with specific voltage applied to it via two contactless electrodes. The experimental data is then collected and used to develop a parametric model to estimate the end-effector position and regulate kinematics model in response to specific input voltage over time. It is observed that the electroactive actuator demonstrates expected behaviour according to the simulation of its kinematics model. The use of 3D printing for the fabrication of parallel soft actuators opens a new chapter in manufacturing sophisticated soft actuators with high dexterity and mechanical robustness for biomedical applications such as cell manipulation and drug release.

  17. SU-F-SPS-09: Parallel MC Kernel Calculations for VMAT Plan Improvement

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chamberlain, S; Roswell Park Cancer Institute, Buffalo, NY; French, S

    Purpose: Adding kernels (small perturbations in leaf positions) to the existing apertures of VMAT control points may improve plan quality. We investigate the calculation of kernel doses using a parallelized Monte Carlo (MC) method. Methods: A clinical prostate VMAT DICOM plan was exported from Eclipse. An arbitrary control point and leaf were chosen, and a modified MLC file was created, corresponding to the leaf position offset by 0.5cm. The additional dose produced by this 0.5 cm × 0.5 cm kernel was calculated using the DOSXYZnrc component module of BEAMnrc. A range of particle history counts were run (varying from 3more » × 10{sup 6} to 3 × 10{sup 7}); each job was split among 1, 10, or 100 parallel processes. A particle count of 3 × 10{sup 6} was established as the lower range because it provided the minimal accuracy level. Results: As expected, an increase in particle counts linearly increases run time. For the lowest particle count, the time varied from 30 hours for the single-processor run, to 0.30 hours for the 100-processor run. Conclusion: Parallel processing of MC calculations in the EGS framework significantly decreases time necessary for each kernel dose calculation. Particle counts lower than 1 × 10{sup 6} have too large of an error to output accurate dose for a Monte Carlo kernel calculation. Future work will investigate increasing the number of parallel processes and optimizing run times for multiple kernel calculations.« less

  18. New Factorization Techniques and Parallel (log N) Algorithms for Forward Dynamics Solution of Single Closed-Chain Robot Manipulators

    NASA Technical Reports Server (NTRS)

    Fijany, Amir

    1993-01-01

    In this paper parallel 0(log N) algorithms for dynamic simulation of single closed-chain rigid multibody system as specialized to the case of a robot manipulatoar in contact with the environment are developed.

  19. Parallel 3D Multi-Stage Simulation of a Turbofan Engine

    NASA Technical Reports Server (NTRS)

    Turner, Mark G.; Topp, David A.

    1998-01-01

    A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.

  20. Running Parallel Discrete Event Simulators on Sierra

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnes, P. D.; Jefferson, D. R.

    2015-12-03

    In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.

  1. Novel molecular targets for kRAS downregulation: promoter G-quadruplexes

    DTIC Science & Technology

    2016-11-01

    conditions, and described the structure as having mixed parallel/anti-parallel loops of lengths 2:8:10 in the 5’-3’ direction. Using selective small...and anti-parallel loop directionality of lengths 4:10:8 in the 5’–3’ direction, three tetrads stacked, and involving guanines in runs B, C, E, and F...a tri-stacked structure incorporating runs B, C, E and F with intervening loops of 2, 10, and 8 bases in the 5’–3’ direction. G = black circles, C

  2. Nouvelle serie d'oxydes derives de la structure de α-U 3U 8: MIIUMo 4O 16

    NASA Astrophysics Data System (ADS)

    Lee, M. R.; Jaulmes, S.

    1987-04-01

    A new family of isotypical oxides MIIUMo 4O 16 ( MII = Mg,Mn,Cd,Ca,Hg,Sr,Pb) is identified. The structure of the compound with Ca was determined by X-ray diffraction. It is triclinic, space group P overline1 with a = 13.239(5) Å, b = 6.651(2) Å, c = 8.236(3) Å, α = 90°00(4), β = 90°38(4), γ = 120°16(3), Z = 2. The final index and the weighted Rw index are 0.049 and 0.040, respectively. The cell is related to the orthorhombic one of α-U 3O 8: a = 2 a0, b = -( a0 + b0)/2, c = 2 c0. The structure, reminiscent of that of α-U 3O 8, consists of chains of [Ca,U]O 7 pentagonal bipyramids and MoO 6 octahedra, running parallel to the c axis. The UO distances along the UOCaO chains are shortened to 1.77(1) Å. The uranyl ion was characterized by its IR spectrum.

  3. A Multiobjective Optimization Framework for Online Stochastic Optimal Control in Hybrid Electric Vehicles

    DOE PAGES

    Malikopoulos, Andreas

    2015-01-01

    The increasing urgency to extract additional efficiency from hybrid propulsion systems has led to the development of advanced power management control algorithms. In this paper we address the problem of online optimization of the supervisory power management control in parallel hybrid electric vehicles (HEVs). We model HEV operation as a controlled Markov chain and we show that the control policy yielding the Pareto optimal solution minimizes online the long-run expected average cost per unit time criterion. The effectiveness of the proposed solution is validated through simulation and compared to the solution derived with dynamic programming using the average cost criterion.more » Both solutions achieved the same cumulative fuel consumption demonstrating that the online Pareto control policy is an optimal control policy.« less

  4. Coding coarse grained polymer model for LAMMPS and its application to polymer crystallization

    NASA Astrophysics Data System (ADS)

    Luo, Chuanfu; Sommer, Jens-Uwe

    2009-08-01

    We present a patch code for LAMMPS to implement a coarse grained (CG) model of poly(vinyl alcohol) (PVA). LAMMPS is a powerful molecular dynamics (MD) simulator developed at Sandia National Laboratories. Our patch code implements tabulated angular potential and Lennard-Jones-9-6 (LJ96) style interaction for PVA. Benefited from the excellent parallel efficiency of LAMMPS, our patch code is suitable for large-scale simulations. This CG-PVA code is used to study polymer crystallization, which is a long-standing unsolved problem in polymer physics. By using parallel computing, cooling and heating processes for long chains are simulated. The results show that chain-folded structures resembling the lamellae of polymer crystals are formed during the cooling process. The evolution of the static structure factor during the crystallization transition indicates that long-range density order appears before local crystalline packing. This is consistent with some experimental observations by small/wide angle X-ray scattering (SAXS/WAXS). During the heating process, it is found that the crystalline regions are still growing until they are fully melted, which can be confirmed by the evolution both of the static structure factor and average stem length formed by the chains. This two-stage behavior indicates that melting of polymer crystals is far from thermodynamic equilibrium. Our results concur with various experiments. It is the first time that such growth/reorganization behavior is clearly observed by MD simulations. Our code can be easily used to model other type of polymers by providing a file containing the tabulated angle potential data and a set of appropriate parameters. Program summaryProgram title: lammps-cgpva Catalogue identifier: AEDE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDE_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU's GPL No. of lines in distributed program, including test data, etc.: 940 798 No. of bytes in distributed program, including test data, etc.: 12 536 245 Distribution format: tar.gz Programming language: C++/MPI Computer: Tested on Intel-x86 and AMD64 architectures. Should run on any architecture providing a C++ compiler Operating system: Tested under Linux. Any other OS with C++ compiler and MPI library should suffice Has the code been vectorized or parallelized?: Yes RAM: Depends on system size and how many CPUs are used Classification: 7.7 External routines: LAMMPS ( http://lammps.sandia.gov/), FFTW ( http://www.fftw.org/) Nature of problem: Implementing special tabular angle potentials and Lennard-Jones-9-6 style interactions of a coarse grained polymer model for LAMMPS code. Solution method: Cubic spline interpolation of input tabulated angle potential data. Restrictions: The code is based on a former version of LAMMPS. Unusual features.: Any special angular potential can be used if it can be tabulated. Running time: Seconds to weeks, depending on system size, speed of CPU and how many CPUs are used. The test run provided with the package takes about 5 minutes on 4 AMD's opteron (2.6 GHz) CPUs. References:D. Reith, H. Meyer, F. Müller-Plathe, Macromolecules 34 (2001) 2335-2345. H. Meyer, F. Müller-Plathe, J. Chem. Phys. 115 (2001) 7807. H. Meyer, F. Müller-Plathe, Macromolecules 35 (2002) 1241-1252.

  5. Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

    PubMed Central

    Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

    2014-01-01

    The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299

  6. Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

    PubMed

    Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

    2014-11-01

    The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.

  7. Scalable computing for evolutionary genomics.

    PubMed

    Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

    2012-01-01

    Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.

  8. The parallel reaction monitoring method contributes to a highly sensitive polyubiquitin chain quantification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tsuchiya, Hikaru; Tanaka, Keiji, E-mail: tanaka-kj@igakuken.or.jp; Saeki, Yasushi, E-mail: saeki-ys@igakuken.or.jp

    2013-06-28

    Highlights: •The parallel reaction monitoring method was applied to ubiquitin quantification. •The ubiquitin PRM method is highly sensitive even in biological samples. •Using the method, we revealed that Ufd4 assembles the K29-linked ubiquitin chain. -- Abstract: Ubiquitylation is an essential posttranslational protein modification that is implicated in a diverse array of cellular functions. Although cells contain eight structurally distinct types of polyubiquitin chains, detailed function of several chain types including K29-linked chains has remained largely unclear. Current mass spectrometry (MS)-based quantification methods are highly inefficient for low abundant atypical chains, such as K29- and M1-linked chains, in complex mixtures thatmore » typically contain highly abundant proteins. In this study, we applied parallel reaction monitoring (PRM), a quantitative, high-resolution MS method, to quantify ubiquitin chains. The ubiquitin PRM method allows us to quantify 100 attomole amounts of all possible ubiquitin chains in cell extracts. Furthermore, we quantified ubiquitylation levels of ubiquitin-proline-β-galactosidase (Ub-P-βgal), a historically known model substrate of the ubiquitin fusion degradation (UFD) pathway. In wild-type cells, Ub-P-βgal is modified with ubiquitin chains consisting of 21% K29- and 78% K48-linked chains. In contrast, K29-linked chains are not detected in UFD4 knockout cells, suggesting that Ufd4 assembles the K29-linked ubiquitin chain(s) on Ub-P-βgal in vivo. Thus, the ubiquitin PRM is a novel, useful, quantitative method for analyzing the highly complicated ubiquitin system.« less

  9. Fatigue-induced changes in decline running.

    PubMed

    Mizrahi, J; Verbitsky, O; Isakov, E

    2001-03-01

    Study the relation between muscle fatigue during eccentric muscle contractions and kinematics of the legs in downhill running. Decline running on a treadmill was used to acquire data on shock accelerations, muscle activity and kinematics, for comparison with level running. In downhill running, local muscle fatigue is the cause of morphological muscle damage which leads to reduced attenuation of shock accelerations. Fourteen subjects ran on a treadmill above level-running anaerobic threshold speed for 30 min, in level and -4 degrees decline running. The following were monitored: metabolic fatigue by means of respiratory parameters; muscle fatigue of the quadriceps by means of elevation in myoelectric activity; and kinematic parameters including knee and ankle angles and hip vertical excursion by means of computerized videography. Data on shock transmission reported in previous studies were also used. Quadriceps fatigue develops in parallel to an increasing vertical excursion of the hip in the stance phase of running, enabled by larger dorsi flexion of the ankle rather than by increased flexion of the knee. The decrease in shock attenuation can be attributed to quadriceps muscle fatigue in parallel to increased vertical excursion of the hips.

  10. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.

  11. Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

    NASA Astrophysics Data System (ADS)

    Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

    2011-10-01

    SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.

  12. Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.

    PubMed

    Hoffmann, Thomas J

    2011-03-01

    It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.

  13. Crashworthiness simulations with DYNA3D

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schauer, D.A.; Hoover, C.G.; Kay, G.J.

    1996-04-01

    Current progress in parallel algorithm research and applications in vehicle crash simulation is described for the explicit, finite element algorithms in DYNA3D. Problem partitioning methods and parallel algorithms for contact at material interfaces are the two challenging algorithm research problems that are addressed. Two prototype parallel contact algorithms have been developed for treating the cases of local and arbitrary contact. Demonstration problems for local contact are crashworthiness simulations with 222 locally defined contact surfaces and a vehicle/barrier collision modeled with arbitrary contact. A simulation of crash tests conducted for a vehicle impacting a U-channel small sign post embedded in soilmore » has been run on both the serial and parallel versions of DYNA3D. A significant reduction in computational time has been observed when running these problems on the parallel version. However, to achieve maximum efficiency, complex problems must be appropriately partitioned, especially when contact dominates the computation.« less

  14. Simulation of LHC events on a millions threads

    NASA Astrophysics Data System (ADS)

    Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.

    2015-12-01

    Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the Grid, however, will not double. The HEP community must consider how to bridge this computing gap by targeting larger compute resources and using the available compute resources as efficiently as possible. Argonne's Mira, the fifth fastest supercomputer in the world, can run roughly five times the number of parallel processes that the ATLAS experiment typically uses on the Grid. We ported Alpgen, a serial x86 code, to run as a parallel application under MPI on the Blue Gene/Q architecture. By analysis of the Alpgen code, we reduced the memory footprint to allow running 64 threads per node, utilizing the four hardware threads available per core on the PowerPC A2 processor. Event generation and unweighting, typically run as independent serial phases, are coupled together in a single job in this scenario, reducing intermediate writes to the filesystem. By these optimizations, we have successfully run LHC proton-proton physics event generation at the scale of a million threads, filling two-thirds of Mira.

  15. Plasma Physics Calculations on a Parallel Macintosh Cluster

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter

    2000-03-01

    We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.

  16. Plasma Physics Calculations on a Parallel Macintosh Cluster

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor K.; Dauger, Dean E.; Kokelaar, Pieter R.

    We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 Mflops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.

  17. Determination of backbone chain direction of PDA using FFM

    NASA Astrophysics Data System (ADS)

    Jo, Sadaharu; Okamoto, Kentaro; Takenaga, Mitsuru

    2010-01-01

    The effect of backbone chains on friction force was investigated on both Langmuir-Blodgett (LB) films of 10,12-heptacosadiynoic acid and the (0 1 0) surfaces of single crystals of 2,4-hexadiene-1,6-diol using friction force microscopy (FFM). It was observed that friction force decreased when the scanning direction was parallel to the [0 0 1] direction in both samples. Moreover, friction force decreased when the scanning direction was parallel to the crystallographic [1 0 2], [1 0 1], [1 0 0] and [1 0 1¯] directions in only the single crystals. For the LB films, the [0 0 1] direction corresponds to the backbone chain direction of 10,12-heptacosadiynoic acid. For the single crystals, both the [0 0 1] and [1 0 1] directions correspond to the backbone chain direction, and the [1 0 2], [1 0 0] and [1 0 1¯] directions correspond to the low-index crystallographic direction. In both the LB films and single crystals, the friction force was minimized when the directions of scanning and the backbone chain were parallel.

  18. Using Parallel Processing for Problem Solving.

    DTIC Science & Technology

    1979-12-01

    are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language

  19. Polymer brushes in explicit poor solvents studied using a new variant of the bond fluctuation model

    NASA Astrophysics Data System (ADS)

    Jentzsch, Christoph; Sommer, Jens-Uwe

    2014-09-01

    Using a variant of the Bond Fluctuation Model which improves its parallel efficiency in particular running on graphic cards we perform large scale simulations of polymer brushes in poor explicit solvent. Grafting density, solvent quality, and chain length are varied. Different morphological structures in particular octopus micelles are observed for low grafting densities. We reconsider the theoretical model for octopus micelles proposed by Williams using scaling arguments with the relevant scaling variable being σ/σc, and with the characteristic grafting density given by σc ˜ N-4/3. We find that octopus micelles only grow laterally, but not in height and we propose an extension of the model by assuming a cylindrical shape instead of a spherical geometry for the micelle-core. We show that the scaling variable σ/σc can be applied to master plots for the averaged height of the brush, the size of the micelles, and the number of chains per micelle. The exponents in the corresponding power law relations for the grafting density and chain length are in agreement with the model for flat cylindrical micelles. We also investigate the surface roughness and find that polymer brushes in explicit poor solvent at grafting densities higher than the stretching transition are flat and surface rippling can only be observed close to the stretching transition.

  20. Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

    2003-01-01

    In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  1. On the difference between the pyroxenes LiFeSi2O6 and LiFeGe2O6 in their magnetic structures and spin orientations

    NASA Astrophysics Data System (ADS)

    Lee, Changhoon; Hong, Jisook; Shim, Ji Hoon; Whangbo, Myung-Hwan

    2014-03-01

    The clinopyroxenes LiFeSi2O6 and LiFeGe2O6, crystallizing in a monoclinic space group P21/c, are isostructural and isoelectronic Their crystal structures are made up of zigzag chains of edge-sharing FeO6 octahedra containing high-spin Fe3 + ions, which run along the c direction. Despite this structural similarity, the two have quite different magnetic structures and spin orientations. In LiFeSi2O6 the Fe spins have a ferromagnetic coupling within the zigzag chains along c and such FM chains have an antiferromagnetic coupling along a. In contrast, in LiFeGe2O6, the spins have an AFM coupling within the zigzag chains along c and such FM chains have an ↑ ↑ ↓ ↓ coupling along a. In addition, the spin orientation is parallel to c in LiFeSi2O6, but is perpendicular to c in LiFeGe2O6. To explain these differences in the magnetic structure and spin orientation, we evaluated the spin exchange parameters by performing energy mapping analysis based on LDA +U and GGA +U calculations and also by evaluating the magnetocrystalline anisotropy energies in terms of GGA +U +SOC and LDA +U +SOC calculations. Our study show that the magnetic structures and spin orientations of LiFeSi2O6 and LiFeGe2O6 are better described by LDA +U and LDA +U +SOC calculations. This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2013R1A1A2060341).

  2. Scalable load balancing for massively parallel distributed Monte Carlo particle transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

    2013-07-01

    In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less

  3. Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

    NASA Technical Reports Server (NTRS)

    Fricker, David M.

    1997-01-01

    The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.

  4. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Gryphon, Coranth D.; Miller, Mark D.

    1991-01-01

    PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.

  5. Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  6. Network support for system initiated checkpoints

    DOEpatents

    Chen, Dong; Heidelberger, Philip

    2013-01-29

    A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.

  7. Effects of prolonged strenuous endurance exercise on plasma myosin heavy chain fragments and other muscular proteins. Cycling vs running.

    PubMed

    Koller, A; Mair, J; Schobersberger, W; Wohlfarter, T; Haid, C; Mayr, M; Villiger, B; Frey, W; Puschendorf, B

    1998-03-01

    This study evaluates creatine kinase, myosin heavy chain, and cardiac troponin blood levels following three types of exercise: 1) short-distance uphill or downhill running; 2) alpine ultramarathon; and 3) alpine long-distance cycling. Comparative field study; follow-up up to 10 days. Department of Sports Medicine. All biochemical markers were analysed at the Department of Medical Chemistry and Biochemistry. Subjects included healthy, trained males (N = 53). All subjects were nonsmokers and free from medication prior to and during the study. Each volunteer was an experienced runner or cyclist, who had at least once successfully finished the Swiss Alpine Marathon of Davos or the Otztal-Radmarathon before. Running or cycling. Plasma concentrations of creatine kinase, myosin heavy chain fragments and cardiac troponins were measured to diagnose skeletal and cardiac muscle damage, respectively. Skeletal muscle protein release is markedly different between uphill and downhill running, with very little evidence for muscle damage in the uphill runners. There is considerable muscle protein leakage in the ultramarathoners (67 km distance; 30 km downhill running). In contrast, only modest amounts of skeletal muscle damage are found after alpine long-distance cycling (230 km distance). This study proves that there is slow-twitch skeletal muscle fiber damage after prolonged strenuous endurance exercise and short-distance downhill running. Exhaustive endurance exercise involving downhill running and short-distance downhill running lead to more pronounced injury than strenuous endurance exercise involving concentric actions. From our results there is no reason for suggesting that prolonged intense exercise may induce myocardial injury in symptom-less athletes without cardiac deseases.

  8. Parallel Computational Protein Design.

    PubMed

    Zhou, Yichao; Donald, Bruce R; Zeng, Jianyang

    2017-01-01

    Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.

  9. BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs

    PubMed Central

    Eklund, Anders; Dufort, Paul; Villani, Mattias; LaConte, Stephen

    2014-01-01

    Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs) to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language) that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm3 brain template in 4–6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/). PMID:24672471

  10. Students' Adoption of Course-Specific Approaches to Learning in Two Parallel Courses

    ERIC Educational Resources Information Center

    Öhrstedt, Maria; Lindfors, Petra

    2016-01-01

    Research on students' adoption of course-specific approaches to learning in parallel courses is limited and inconsistent. This study investigated second-semester psychology students' levels of deep, surface and strategic approaches in two courses running in parallel within a real-life university setting. The results showed significant differences…

  11. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele

    2001-01-01

    This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.

  12. Quantum communication beyond the localization length in disordered spin chains.

    PubMed

    Allcock, Jonathan; Linden, Noah

    2009-03-20

    We study the effects of localization on quantum state transfer in spin chains. We show how to use quantum error correction and multiple parallel spin chains to send a qubit with high fidelity over arbitrary distances, in particular, distances much greater than the localization length of the chain.

  13. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  14. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2013-01-01

    With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less

  15. Parallel computing in genomic research: advances and applications

    PubMed Central

    Ocaña, Kary; de Oliveira, Daniel

    2015-01-01

    Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801

  16. Parallel computing in genomic research: advances and applications.

    PubMed

    Ocaña, Kary; de Oliveira, Daniel

    2015-01-01

    Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.

  17. Communication library for run-time visualization of distributed, asynchronous data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rowlan, J.; Wightman, B.T.

    1994-04-01

    In this paper we present a method for collecting and visualizing data generated by a parallel computational simulation during run time. Data distributed across multiple processes is sent across parallel communication lines to a remote workstation, which sorts and queues the data for visualization. We have implemented our method in a set of tools called PORTAL (for Parallel aRchitecture data-TrAnsfer Library). The tools comprise generic routines for sending data from a parallel program (callable from either C or FORTRAN), a semi-parallel communication scheme currently built upon Unix Sockets, and a real-time connection to the scientific visualization program AVS. Our methodmore » is most valuable when used to examine large datasets that can be efficiently generated and do not need to be stored on disk. The PORTAL source libraries, detailed documentation, and a working example can be obtained by anonymous ftp from info.mcs.anl.gov from the file portal.tar.Z from the directory pub/portal.« less

  18. Topology of polymer chains under nanoscale confinement.

    PubMed

    Satarifard, Vahid; Heidari, Maziar; Mashaghi, Samaneh; Tans, Sander J; Ejtehadi, Mohammad Reza; Mashaghi, Alireza

    2017-08-24

    Spatial confinement limits the conformational space accessible to biomolecules but the implications for bimolecular topology are not yet known. Folded linear biopolymers can be seen as molecular circuits formed by intramolecular contacts. The pairwise arrangement of intra-chain contacts can be categorized as parallel, series or cross, and has been identified as a topological property. Using molecular dynamics simulations, we determine the contact order distributions and topological circuits of short semi-flexible linear and ring polymer chains with a persistence length of l p under a spherical confinement of radius R c . At low values of l p /R c , the entropy of the linear chain leads to the formation of independent contacts along the chain and accordingly, increases the fraction of series topology with respect to other topologies. However, at high l p /R c , the fraction of cross and parallel topologies are enhanced in the chain topological circuits with cross becoming predominant. At an intermediate confining regime, we identify a critical value of l p /R c , at which all topological states have equal probability. Confinement thus equalizes the probability of more complex cross and parallel topologies to the level of the more simple, non-cooperative series topology. Moreover, our topology analysis reveals distinct behaviours for ring- and linear polymers under weak confinement; however, we find no difference between ring- and linear polymers under strong confinement. Under weak confinement, ring polymers adopt parallel and series topologies with equal likelihood, while linear polymers show a higher tendency for series arrangement. The radial distribution analysis of the topology reveals a non-uniform effect of confinement on the topology of polymer chains, thereby imposing more pronounced effects on the core region than on the confinement surface. Additionally, our results reveal that over a wide range of confining radii, loops arranged in parallel and cross topologies have nearly the same contact orders. Such degeneracy implies that the kinetics and transition rates between the topological states cannot be solely explained by contact order. We expect these findings to be of general importance in understanding chaperone assisted protein folding, chromosome architecture, and the evolution of molecular folds.

  19. PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Pattern

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gong, Zhenhuan; Boyuka, David; Zou, X

    Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less

  20. Study of Thread Level Parallelism in a Video Encoding Application for Chip Multiprocessor Design

    NASA Astrophysics Data System (ADS)

    Debes, Eric; Kaine, Greg

    2002-11-01

    In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.

  1. A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

    PubMed

    Liang, Faming; Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.

  2. A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

    PubMed Central

    Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469

  3. Scalable descriptive and correlative statistics with Titan.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, David C.; Pebay, Philippe Pierre

    This report summarizes the existing statistical engines in VTK/Titan and presents the parallel versions thereof which have already been implemented. The ease of use of these parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; then, this theoretical property is verified with test runs that demonstrate optimal parallel speed-up with up to 200 processors.

  4. ProperCAD: A portable object-oriented parallel environment for VLSI CAD

    NASA Technical Reports Server (NTRS)

    Ramkumar, Balkrishna; Banerjee, Prithviraj

    1993-01-01

    Most parallel algorithms for VLSI CAD proposed to date have one important drawback: they work efficiently only on machines that they were designed for. As a result, algorithms designed to date are dependent on the architecture for which they are developed and do not port easily to other parallel architectures. A new project under way to address this problem is described. A Portable object-oriented parallel environment for CAD algorithms (ProperCAD) is being developed. The objectives of this research are (1) to develop new parallel algorithms that run in a portable object-oriented environment (CAD algorithms using a general purpose platform for portable parallel programming called CARM is being developed and a C++ environment that is truly object-oriented and specialized for CAD applications is also being developed); and (2) to design the parallel algorithms around a good sequential algorithm with a well-defined parallel-sequential interface (permitting the parallel algorithm to benefit from future developments in sequential algorithms). One CAD application that has been implemented as part of the ProperCAD project, flat VLSI circuit extraction, is described. The algorithm, its implementation, and its performance on a range of parallel machines are discussed in detail. It currently runs on an Encore Multimax, a Sequent Symmetry, Intel iPSC/2 and i860 hypercubes, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. Performance data for other applications that were developed are provided: namely test pattern generation for sequential circuits, parallel logic synthesis, and standard cell placement.

  5. Affinity-reversed-phase liquid chromatography assay to quantitate recombinant antibodies and antibody fragments in fermentation broth.

    PubMed

    Battersby, J E; Snedecor, B; Chen, C; Champion, K M; Riddle, L; Vanderlaan, M

    2001-08-24

    An automated dual-column liquid chromatography assay comprised of affinity and reversed-phase separations that quantifies the majority of antibody-related protein species found in crude cell extracts of recombinant origin is described. Although potentially applicable to any antibody preparation, we here use samples of anti-CD18 (Fab'2LZ) and a full-length antibody, anti-tissue factor (anti-TF), from various stages throughout a biopharmaceutical production process to describe the assay details. The targeted proteins were captured on an affinity column containing an anti-light-chain (kappa) Fab antibody (AME5) immobilized on controlled pore glass. The affinity column was placed in-line with a reversed-phase column and the captured components were transferred by elution with dilute acid and subsequently resolved by eluting the reversed-phase column with a shallow acetonitrile gradient. Characterization of the resolved components showed that most antibody fragment preparations contained a light-chain fragment, free light chain, light-chain dimer and multiple forms of Fab'. Analysis of full-length antibody preparations also resolved these fragments as well as a completely assembled form. Co-eluting with the full-length antibody were high-molecular-mass variants that were missing one or both light chains. Resolved components were quantified by comparison with peak areas of similarly treated standards. By comparing the two-dimensional polyacrylamide gel electrophoresis patterns of an Escherichia coli blank run, a production run and the material affinity captured (AME5) from a production run, it was determined that the AME5 antibody captured isoforms of light chain, light chain covalently attached to heavy chain, and truncated light chain isoforms. These forms comprise the bulk of the soluble product-related fragments found in E. coli cell extracts of recombinantly produced antibody fragments.

  6. Development for SSV on a parallel processing system (PARAGON)

    NASA Astrophysics Data System (ADS)

    Gothard, Benny M.; Allmen, Mark; Carroll, Michael J.; Rich, Dan

    1995-12-01

    A goal of the surrogate semi-autonomous vehicle (SSV) program is to have multiple vehicles navigate autonomously and cooperatively with other vehicles. This paper describes the process and tools used in porting UGV/SSV (unmanned ground vehicle) autonomous mobility and target recognition algorithms from a SISD (single instruction single data) processor architecture (i.e., a Sun SPARC workstation running C/UNIX) to a MIMD (multiple instruction multiple data) parallel processor architecture (i.e., PARAGON-a parallel set of i860 processors running C/UNIX). It discusses the gains in performance and the pitfalls of such a venture. It also examines the merits of this processor architecture (based on this conceptual prototyping effort) and programming paradigm to meet the final SSV demonstration requirements.

  7. Parallelization and automatic data distribution for nuclear reactor simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less

  8. DNA motif alignment by evolving a population of Markov chains.

    PubMed

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  9. Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

    NASA Technical Reports Server (NTRS)

    Long, Lyle N.; Brentner, Kenneth S.

    2000-01-01

    This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.

  10. GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid

    NASA Astrophysics Data System (ADS)

    Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua

    2016-10-01

    A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.

  11. Scalable Domain Decomposed Monte Carlo Particle Transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, Matthew Joseph

    2013-12-05

    In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.

  12. PyPele Rewritten To Use MPI

    NASA Technical Reports Server (NTRS)

    Hockney, George; Lee, Seungwon

    2008-01-01

    A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.

  13. SAChES: Scalable Adaptive Chain-Ensemble Sampling.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Swiler, Laura Painton; Ray, Jaideep; Ebeida, Mohamed Salah

    We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the usemore » of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.« less

  14. Static analysis of the hull plate using the finite element method

    NASA Astrophysics Data System (ADS)

    Ion, A.

    2015-11-01

    This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.

  15. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  16. A comparison of five benchmarks

    NASA Technical Reports Server (NTRS)

    Huss, Janice E.; Pennline, James A.

    1987-01-01

    Five benchmark programs were obtained and run on the NASA Lewis CRAY X-MP/24. A comparison was made between the programs codes and between the methods for calculating performance figures. Several multitasking jobs were run to gain experience in how parallel performance is measured.

  17. DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

    PubMed

    Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

    2004-09-09

    Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  18. Scalable Domain Decomposed Monte Carlo Particle Transport

    NASA Astrophysics Data System (ADS)

    O'Brien, Matthew Joseph

    In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.

  19. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    NASA Astrophysics Data System (ADS)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  20. Pit-chain in Noctis Labyrinthus

    NASA Image and Video Library

    2002-12-20

    These pit-chain features in this NASA Mars Odyssey image of south Noctis Labryinthus are oriented parallel to grabens in the area, suggesting that tensional stresses may have been responsible for their formation.

  1. Parallel computing for automated model calibration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burke, John S.; Danielson, Gary R.; Schulz, Douglas A.

    2002-07-29

    Natural resources model calibration is a significant burden on computing and staff resources in modeling efforts. Most assessments must consider multiple calibration objectives (for example magnitude and timing of stream flow peak). An automated calibration process that allows real time updating of data/models, allowing scientists to focus effort on improving models is needed. We are in the process of building a fully featured multi objective calibration tool capable of processing multiple models cheaply and efficiently using null cycle computing. Our parallel processing and calibration software routines have been generically, but our focus has been on natural resources model calibration. Somore » far, the natural resources models have been friendly to parallel calibration efforts in that they require no inter-process communication, only need a small amount of input data and only output a small amount of statistical information for each calibration run. A typical auto calibration run might involve running a model 10,000 times with a variety of input parameters and summary statistical output. In the past model calibration has been done against individual models for each data set. The individual model runs are relatively fast, ranging from seconds to minutes. The process was run on a single computer using a simple iterative process. We have completed two Auto Calibration prototypes and are currently designing a more feature rich tool. Our prototypes have focused on running the calibration in a distributed computing cross platform environment. They allow incorporation of?smart? calibration parameter generation (using artificial intelligence processing techniques). Null cycle computing similar to SETI@Home has also been a focus of our efforts. This paper details the design of the latest prototype and discusses our plans for the next revision of the software.« less

  2. A Mixed-Valent Molybdenum Monophosphate with a Layer Structure: KMo 3P 2O 14

    NASA Astrophysics Data System (ADS)

    Guesdon, A.; Borel, M. M.; Leclaire, A.; Grandin, A.; Raveau, B.

    1994-03-01

    A new mixed-valent molybdenum monophosphate with a layer structure KMo 3P 2O 14 has been isolated. It crystallizes in the space group P2 1/ m with a = 8.599(2) Å, b = 6.392(2) Å, c = 10.602(1) Å, and β = 111.65(2)°. The layers [Mo 3P 2O 14] ∞ are parallel to (100) and consist of [MoPO 8] ∞ chains running along limitb→ , in which one MoO 6 octahedron alternates with one PO 4 tetrahedron. In fact, four [MoPO 8] ∞ chains share the corners of their polyhedra and the edges of their octahedra, forming [Mo 4P 4O 24] ∞ columns which are linked through MoO 5 bipyramids along limitc→. The K + ions interleaved between these layers are surrounded by eight oxygens, forming bicapped trigonal prisms KO 8. Besides the unusual trigonal bipyramids MoO 5, this structure is also characterized by a tendency to the localization of the electrons, since one octahedral site is occupied by Mo(V), whereas the other octahedral site and the trigonal bipyramid are occupied by Mo(VI). The similarity of this structure with pure octahedral layer structures suggests the possibility of generating various derivatives, and of ion exchange properties.

  3. 4,4′-Bipyridinium bis(perchlorate)–4-aminobenzoic acid–4,4′-bipyridine–water (1/4/2/2)

    PubMed Central

    Meng, Qun-Hui; Han, Lu; Hou, Jian-Dong; Luo, Yi-Fan; Zeng, Rong-Hua

    2009-01-01

    In the structure of the title compound, C10H10N2 2+·2ClO4 −·4C7H7NO2·2C10H8N2·2H2O, the 4,4′-bipyridinium cation has a crystallographically imposed centre of symmetry. The cation is linked by N—H⋯N hydrogen bonds to adjacent 4,4′-bipyridine mol­ecules, which in turn inter­act via O—H⋯N hydrogen bonds with 4-amino­benzoic acid mol­ecules, forming chains running parallel to [30]. The chains are further connected into a three-dimensional network by N—H⋯O and O—H⋯O hydrogen-bonding inter­actions involving the perchlorate anion, the water mol­ecules and the 4-amino­benzoic acid mol­ecules. In addition, π–π stacking inter­actions with centroid–centroid distances ranging from 3.663 (6) to 3.695 (6) Å are present. The O atoms of the perchlorate anion are disordered over two sets of positions, with refined site occupancies of 0.724 (9) and 0.276 (9). PMID:21581593

  4. New metal oxides of the family Am[( TO) q]: ALiMn 3O 4 and ALiZn 3O 4 ( A = K, Rb)

    NASA Astrophysics Data System (ADS)

    Hoppe, R.; Seipp, E.; Baier, R.

    1988-01-01

    The new compounds KLiMn 3O 4 ( I), RbLiMn 3O 4 ( II), KLiZn 3O 4 ( III) and RbLiZn 3O 4 ( IV) have been prepared by solid state reaction of A2O ( A = K, Rb), Li 2O, and MO ( M = Mn, Zn). The isomorphous compounds are tetragonal, space group {I4}/{m}, Z = 2 , with lattice constants a = 838.32(4) pm, c = 341.88(3) pm for I; a = 840.66(8) pm, c = 344.85(4) pm for II; a = 819.27(9) pm, c = 334.20(7) pm for III,a = 823.62(9) pm, c = 339.73(7) pm for IV, as determined from Guinier X-ray powder patterns. The orange-colored manganates and colorless zincates are sensitive to moisture. The crystal structures of II and III have been determined by single-crystal X-ray techniques and refined to R = 0.09 ( II) and R = 0.06 ( III). The structure is built up from chains of face-shared cubes, 1∞[A O{8}/{2}] (A = K, Rb) , running parallel to the c axis. These are connected by Li + and M2+ ( M = Mn, Zn), statistically distributed on tetrahedral positions between the chains.

  5. Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |

    Science.gov Websites

    parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N

  6. Morphology of poly-p-xylylene crystallized during polymerization.

    NASA Technical Reports Server (NTRS)

    Kubo, S.; Wunderlich, B.

    1971-01-01

    The morphology of as-polymerized poly-p-xylylene grown between -17 and 30 C is found to consist of lame llar alpha crystals oriented with the (010) plane parallel to the support surface. The crystallinity decreases with decreasing polymerization temperature. Spherulitic and nonspherulitic portions of the polymer film consist of folded chain lamellas with the chain axis parallel to the support surface. The results were obtained by small- and wide-angle X-ray measurements, electron and optical microscopy, and differential thermal analysis.

  7. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  8. Increasing airport capacity with modified IFR approach procedures for close-spaced parallel runways

    DOT National Transportation Integrated Search

    2001-01-01

    Because of wake turbulence considerations, current instrument approach : procedures treat close-spaced (i.e., less than 2,500 feet apart) parallel run : ways as a single runway. This restriction is designed to assure safety for all : aircraft types u...

  9. Parallel Computation of the Regional Ocean Modeling System (ROMS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, P; Song, Y T; Chao, Y

    2005-04-05

    The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less

  10. ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers

    PubMed Central

    Besnier, Francois; Glover, Kevin A.

    2013-01-01

    This software package provides an R-based framework to make use of multi-core computers when running analyses in the population genetics program STRUCTURE. It is especially addressed to those users of STRUCTURE dealing with numerous and repeated data analyses, and who could take advantage of an efficient script to automatically distribute STRUCTURE jobs among multiple processors. It also consists of additional functions to divide analyses among combinations of populations within a single data set without the need to manually produce multiple projects, as it is currently the case in STRUCTURE. The package consists of two main functions: MPI_structure() and parallel_structure() as well as an example data file. We compared the performance in computing time for this example data on two computer architectures and showed that the use of the present functions can result in several-fold improvements in terms of computation time. ParallelStructure is freely available at https://r-forge.r-project.org/projects/parallstructure/. PMID:23923012

  11. Circuit topology of self-interacting chains: implications for folding and unfolding dynamics.

    PubMed

    Mugler, Andrew; Tans, Sander J; Mashaghi, Alireza

    2014-11-07

    Understanding the relationship between molecular structure and folding is a central problem in disciplines ranging from biology to polymer physics and DNA origami. Topology can be a powerful tool to address this question. For a folded linear chain, the arrangement of intra-chain contacts is a topological property because rearranging the contacts requires discontinuous deformations. Conversely, the topology is preserved when continuously stretching the chain while maintaining the contact arrangement. Here we investigate how the folding and unfolding of linear chains with binary contacts is guided by the topology of contact arrangements. We formalize the topology by describing the relations between any two contacts in the structure, which for a linear chain can either be in parallel, in series, or crossing each other. We show that even when other determinants of folding rate such as contact order and size are kept constant, this 'circuit' topology determines folding kinetics. In particular, we find that the folding rate increases with the fractions of parallel and crossed relations. Moreover, we show how circuit topology constrains the conformational phase space explored during folding and unfolding: the number of forbidden unfolding transitions is found to increase with the fraction of parallel relations and to decrease with the fraction of series relations. Finally, we find that circuit topology influences whether distinct intermediate states are present, with crossed contacts being the key factor. The approach presented here can be more generally applied to questions on molecular dynamics, evolutionary biology, molecular engineering, and single-molecule biophysics.

  12. Memory access in shared virtual memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berrendorf, R.

    1992-01-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  13. Memory access in shared virtual memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berrendorf, R.

    1992-09-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  14. Drug innovation, price controls, and parallel trade.

    PubMed

    Matteucci, Giorgio; Reverberi, Pierfrancesco

    2016-12-21

    We study the long-run welfare effects of parallel trade (PT) in pharmaceuticals. We develop a two-country model of PT with endogenous quality, where the pharmaceutical firm negotiates the price of the drug with the government in the foreign country. We show that, even though the foreign government does not consider global R&D costs, (the threat of) PT improves the quality of the drug as long as the foreign consumers' valuation of quality is high enough. We find that the firm's short-run profit may be higher when PT is allowed. Nonetheless, this is neither necessary nor sufficient for improving drug quality in the long run. We also show that improving drug quality is a sufficient condition for PT to increase global welfare. Finally, we show that, when PT is allowed, drug quality may be higher with than without price controls.

  15. Learning and Parallelization Boost Constraint Search

    ERIC Educational Resources Information Center

    Yun, Xi

    2013-01-01

    Constraint satisfaction problems are a powerful way to abstract and represent academic and real-world problems from both artificial intelligence and operations research. A constraint satisfaction problem is typically addressed by a sequential constraint solver running on a single processor. Rather than construct a new, parallel solver, this work…

  16. The Automated Instrumentation and Monitoring System (AIMS) reference manual

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

    1993-01-01

    Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dritz, K.W.; Boyle, J.M.

    This paper addresses the problem of measuring and analyzing the performance of fine-grained parallel programs running on shared-memory multiprocessors. Such processors use locking (either directly in the application program, or indirectly in a subroutine library or the operating system) to serialize accesses to global variables. Given sufficiently high rates of locking, the chief factor preventing linear speedup (besides lack of adequate inherent parallelism in the application) is lock contention - the blocking of processes that are trying to acquire a lock currently held by another process. We show how a high-resolution, low-overhead clock may be used to measure both lockmore » contention and lack of parallel work. Several ways of presenting the results are covered, culminating in a method for calculating, in a single multiprocessing run, both the speedup actually achieved and the speedup lost to contention for each lock and to lack of parallel work. The speedup losses are reported in the same units, ''processor-equivalents,'' as the speedup achieved. Both are obtained without having to perform the usual one-process comparison run. We chronicle also a variety of experiments motivated by actual results obtained with our measurement method. The insights into program performance that we gained from these experiments helped us to refine the parts of our programs concerned with communication and synchronization. Ultimately these improvements reduced lock contention to a negligible amount and yielded nearly linear speedup in applications not limited by lack of parallel work. We describe two generally applicable strategies (''code motion out of critical regions'' and ''critical-region fissioning'') for reducing lock contention and one (''lock/variable fusion'') applicable only on certain architectures.« less

  18. Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

    NASA Astrophysics Data System (ADS)

    Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

    2012-09-01

    Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.

  19. Can parallel use of different running shoes decrease running-related injury risk?

    PubMed

    Malisoux, L; Ramesh, J; Mann, R; Seil, R; Urhausen, A; Theisen, D

    2015-02-01

    The aim of this study was to determine if runners who use concomitantly different pairs of running shoes are at a lower risk of running-related injury (RRI). Recreational runners (n = 264) participated in this 22-week prospective follow-up and reported all information about their running session characteristics, other sport participation and injuries on a dedicated Internet platform. A RRI was defined as a physical pain or complaint located at the lower limbs or lower back region, sustained during or as a result of running practice and impeding planned running activity for at least 1 day. One-third of the participants (n = 87) experienced at least one RRI during the observation period. The adjusted Cox regression analysis revealed that the parallel use of more than one pair of running shoes was a protective factor [hazard ratio (HR) = 0.614; 95% confidence interval (CI) = 0.389-0.969], while previous injury was a risk factor (HR = 1.722; 95%CI = 1.114-2.661). Additionally, increased mean session distance (km; HR = 0.795; 95%CI = 0.725-0.872) and increased weekly volume of other sports (h/week; HR = 0.848; 95%CI = 0.732-0.982) were associated with lower RRI risk. Multiple shoe use and participation in other sports are strategies potentially leading to a variation of the load applied to the musculoskeletal system. They could be advised to recreational runners to prevent RRI. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

    PubMed

    Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

    2018-01-01

    The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.

  1. The Tera Multithreaded Architecture and Unstructured Meshes

    NASA Technical Reports Server (NTRS)

    Bokhari, Shahid H.; Mavriplis, Dimitri J.

    1998-01-01

    The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.

  2. Hydration of non-polar anti-parallel β-sheets

    NASA Astrophysics Data System (ADS)

    Urbic, Tomaz; Dias, Cristiano L.

    2014-04-01

    In this work we focus on anti-parallel β-sheets to study hydration of side chains and polar groups of the backbone using all-atom molecular dynamics simulations. We show that: (i) water distribution around the backbone does not depend significantly on amino acid sequence, (ii) more water molecules are found around oxygen than nitrogen atoms of the backbone, and (iii) water molecules around nitrogen are highly localized in the planed formed by peptide backbones. To study hydration around side chains we note that anti-parallel β-sheets exhibit two types of cross-strand pairing: Hydrogen-Bond (HB) and Non-Hydrogen-Bond (NHB) pairing. We show that distributions of water around alanine, leucine, and valine side chains are very different at HB compared to NHB faces. For alanine pairs, the space between side chains has a higher concentration of water if residues are located in the NHB face of the β-sheet as opposed to the HB face. For leucine residues, the HB face is found to be dry while the space between side chains at the NHB face alternates between being occupied and non-occupied by water. Surprisingly, for valine residues the NHB face is dry, whereas the HB face is occupied by water. We postulate that these differences in water distribution are related to context dependent propensities observed for β-sheets.

  3. Hydration of non-polar anti-parallel β-sheets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Urbic, Tomaz; Dias, Cristiano L., E-mail: cld@njit.edu

    2014-04-28

    In this work we focus on anti-parallel β-sheets to study hydration of side chains and polar groups of the backbone using all-atom molecular dynamics simulations. We show that: (i) water distribution around the backbone does not depend significantly on amino acid sequence, (ii) more water molecules are found around oxygen than nitrogen atoms of the backbone, and (iii) water molecules around nitrogen are highly localized in the planed formed by peptide backbones. To study hydration around side chains we note that anti-parallel β-sheets exhibit two types of cross-strand pairing: Hydrogen-Bond (HB) and Non-Hydrogen-Bond (NHB) pairing. We show that distributions ofmore » water around alanine, leucine, and valine side chains are very different at HB compared to NHB faces. For alanine pairs, the space between side chains has a higher concentration of water if residues are located in the NHB face of the β-sheet as opposed to the HB face. For leucine residues, the HB face is found to be dry while the space between side chains at the NHB face alternates between being occupied and non-occupied by water. Surprisingly, for valine residues the NHB face is dry, whereas the HB face is occupied by water. We postulate that these differences in water distribution are related to context dependent propensities observed for β-sheets.« less

  4. Parallel Ray Tracing Using the Message Passing Interface

    DTIC Science & Technology

    2007-09-01

    software is available for lens design and for general optical systems modeling. It tends to be designed to run on a single processor and can be very...Cameron, Senior Member, IEEE Abstract—Ray-tracing software is available for lens design and for general optical systems modeling. It tends to be designed to...National Aeronautics and Space Administration (NASA), optical ray tracing, parallel computing, parallel pro- cessing, prime numbers, ray tracing

  5. PISCES: An environment for parallel scientific computation

    NASA Technical Reports Server (NTRS)

    Pratt, T. W.

    1985-01-01

    The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.

  6. Framework for Parallel Preprocessing of Microarray Data Using Hadoop

    PubMed Central

    2018-01-01

    Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. PMID:29796018

  7. A real-time MPEG software decoder using a portable message-passing library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

    1995-12-31

    We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peterson, Steven K

    The U.S. Department of Energy (DOE) has a significant programmatic interest in the safe and secure routing and transportation of Spent Nuclear Fuel (SNF) and High Level Waste (HLW) in the United States, including shipments entering the country from locations outside U.S borders. In any shipment of SNF/HLW, there are multiple chains; a jurisdictional chain as the material moves between jurisdictions (state, federal, tribal, administrative), a physical supply chain (which mode), as well as a custody chain (which stakeholder is in charge/possession) of the materials being transported. Given these interconnected networks, there lies vulnerabilities, whether in lack of communication betweenmore » interested stakeholders or physical vulnerabilities such as interdiction. By identifying key links and nodes as well as administrative weaknesses, decisions can be made to harden the physical network and improve communication between stakeholders. This paper examines the parallel chains of oversight and custody as well as the chain of stakeholder interests for the shipments of SNF/HLW and the potential impacts on systemic resiliency. Using the Crystal River shutdown location as well as a hypothetical international shipment brought into the United States, this paper illustrates the parallel chains and maps them out visually.« less

  9. 5. Aerial view of turnpike path running through center of ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    5. Aerial view of turnpike path running through center of photograph along row of trees. 1917 realignment visible along left edge of photograph along edge of forest. Modernized alignment resumes at top right of photograph. View looking north. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY

  10. Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

    PubMed

    Komarov, Ivan; D'Souza, Roshan M

    2012-01-01

    The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.

  11. Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations

    NASA Astrophysics Data System (ADS)

    Hunsaker, Isaac L.

    Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.

  12. Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

    NASA Technical Reports Server (NTRS)

    Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

    1996-01-01

    This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.

  13. Population annealing with weighted averages: A Monte Carlo method for rough free-energy landscapes

    NASA Astrophysics Data System (ADS)

    Machta, J.

    2010-08-01

    The population annealing algorithm introduced by Hukushima and Iba is described. Population annealing combines simulated annealing and Boltzmann weighted differential reproduction within a population of replicas to sample equilibrium states. Population annealing gives direct access to the free energy. It is shown that unbiased measurements of observables can be obtained by weighted averages over many runs with weight factors related to the free-energy estimate from the run. Population annealing is well suited to parallelization and may be a useful alternative to parallel tempering for systems with rough free-energy landscapes such as spin glasses. The method is demonstrated for spin glasses.

  14. Local rollback for fault-tolerance in parallel computing systems

    DOEpatents

    Blumrich, Matthias A [Yorktown Heights, NY; Chen, Dong [Yorktown Heights, NY; Gara, Alan [Yorktown Heights, NY; Giampapa, Mark E [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugavanam, Krishnan [Yorktown Heights, NY

    2012-01-24

    A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.

  15. The seasonal-cycle climate model

    NASA Technical Reports Server (NTRS)

    Marx, L.; Randall, D. A.

    1981-01-01

    The seasonal cycle run which will become the control run for the comparison with runs utilizing codes and parameterizations developed by outside investigators is discussed. The climate model currently exists in two parallel versions: one running on the Amdahl and the other running on the CYBER 203. These two versions are as nearly identical as machine capability and the requirement for high speed performance will allow. Developmental changes are made on the Amdahl/CMS version for ease of testing and rapidity of turnaround. The changes are subsequently incorporated into the CYBER 203 version using vectorization techniques where speed improvement can be realized. The 400 day seasonal cycle run serves as a control run for both medium and long range climate forecasts alsensitivity studies.

  16. Performance of the NOνA Data Acquisition and Trigger Systems for the full 14 kT Far Detector

    NASA Astrophysics Data System (ADS)

    Norman, A.; Davies, G. S.; Ding, P. F.; Dukes, E. C.; Duyan, H.; Frank, M. J.; R. C. Group; Habig, A.; Henderson, W.; Niner, E.; Mina, R.; Moren, A.; Mualem, L.; Oksuzian, Y.; Rebel, B.; Shanahan, P.; Sheshukov, A.; Tamsett, M.; Tomsen, K.; Vinton, L.; Wang, Z.; Zamorano, B.; Zirnstien, J.

    2015-12-01

    The NOvA experiment uses a continuous, free-running, dead-timeless data acquisition system to collect data from the 14 kT far detector. The DAQ system readouts the more than 344,000 detector channels and assembles the information into an raw unfiltered high bandwidth data stream. The NOvA trigger systems operate in parallel to the readout and asynchronously to the primary DAQ readout/event building chain. The data driven triggering systems for NOvA are unique in that they examine long contiguous time windows of the high resolution readout data and enable the detector to be sensitive to a wide range of physics interactions from those with fast, nanosecond scale signals up to processes with long delayed coincidences between hits which occur at the tens of milliseconds time scale. The trigger system is able to achieve a true 100% live time for the detector, making it sensitive to both beam spill related and off-spill physics.

  17. Analytical study on the generalized Davydov model in the alpha helical proteins

    NASA Astrophysics Data System (ADS)

    Wang, Pan; Xiao, Shu-Hong; Chen, Li; Yang, Gang

    2017-06-01

    In this paper, we investigate the dynamics of a generalized Davydov model derived from an infinite chain of alpha helical protein molecules which contain three hydrogen bonding spines running almost parallel to the helical axis. Through the introduction of the auxiliary function, the bilinear form, one-, two- and three-soliton solutions for the generalized Davydov model are obtained firstly. Propagation and interactions of solitons have been investigated analytically and graphically. The amplitude of the soliton is only related to the complex parameter μ and real parameter 𝜃 with a range of [0, 2π]. The velocity of the soliton is only related to the complex parameter μ, real parameter 𝜃, lattice parameter 𝜀, and physical parameters β1, β3 and β4. Overtaking and head-on interactions of two and three solitons are presented. The common in the interactions of three solitons is the directions of the solitons change after the interactions. The soliton derived in this paper is expected to have potential applications in the alpha helical proteins.

  18. Iterative Importance Sampling Algorithms for Parameter Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.

    In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less

  19. Parallel Event Analysis Under Unix

    NASA Astrophysics Data System (ADS)

    Looney, S.; Nilsson, B. S.; Oest, T.; Pettersson, T.; Ranjard, F.; Thibonnier, J.-P.

    The ALEPH experiment at LEP, the CERN CN division and Digital Equipment Corp. have, in a joint project, developed a parallel event analysis system. The parallel physics code is identical to ALEPH's standard analysis code, ALPHA, only the organisation of input/output is changed. The user may switch between sequential and parallel processing by simply changing one input "card". The initial implementation runs on an 8-node DEC 3000/400 farm, using the PVM software, and exhibits a near-perfect speed-up linearity, reducing the turn-around time by a factor of 8.

  20. RAMA: A file system for massively parallel computers

    NASA Technical Reports Server (NTRS)

    Miller, Ethan L.; Katz, Randy H.

    1993-01-01

    This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

  1. Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization

    NASA Astrophysics Data System (ADS)

    Berg, Bernd A.; Wu, Hao

    2012-10-01

    We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11.5. Nature of problem: Physics of pure SU(3) Quantum Field Theory (QFT). This is relevant for our understanding of Quantum Chromodynamics (QCD). It includes the glueball spectrum, topological properties and the deconfining phase transition of pure SU(3) QFT. For instance, Relativistic Heavy Ion Collision (RHIC) experiments at the Brookhaven National Laboratory provide evidence that quarks confined in hadrons undergo at high enough temperature and pressure a transition into a Quark-Gluon Plasma (QGP). Investigations of its thermodynamics in pure SU(3) QFT are of interest. Solution method: Markov Chain Monte Carlo (MCMC) simulations of SU(3) Lattice Gauge Theory (LGT) with the Wilson action. This is a regularization of pure SU(3) QFT on a hypercubic lattice, which allows approaching the continuum SU(3) QFT by means of Finite Size Scaling (FSS) studies. Specifically, we provide updating routines for the Cabibbo-Marinari heatbath with and without checkerboard parallelization. While the first is suitable for pedagogical purposes and small scale projects, the latter allows for efficient parallel processing. Targetting the geometry of RHIC experiments, we have implemented a Double-Layered Torus (DLT) lattice geometry, which has previously not been used in LGT MCMC simulations and enables inside and outside layers at distinct temperatures, the lower-temperature layer acting as the outside boundary for the higher-temperature layer, where the deconfinement transition goes on. Restrictions: The checkerboard partition of the lattice makes the development of measurement programs more tedious than is the case for an unpartitioned lattice. Presently, only one measurement routine for Polyakov loops is provided. Unusual features: We provide three different versions for the send/receive function of the MPI library, which work for different operating system +compiler +MPI combinations. This involves activating the correct row in the last three rows of our latmpi.par parameter file. The underlying reason is distinct buffer conventions. Running time: For a typical run using an Intel i7 processor, it takes (1.8-6) E-06 seconds to update one link of the lattice, depending on the compiler used. For example, if we do a simulation on a small (4 * 83) DLT lattice with a statistics of 221 sweeps (i.e., update the two lattice layers of 4 * (4 * 83) links each 221 times), the total CPU time needed can be 2 * 4 * (4 * 83) * 221 * 3 E-06 seconds = 1.7 minutes, where 2 — two layers of lattice 4 — four dimensions 83 * 4 — lattice size 221 — sweeps of updating 6 E-06 s mdash; average time to update one link variable. If we divide the job into 8 parallel processes, then the real time is (for negligible communication overhead) 1.7 mins / 8 = 0.2 mins.

  2. ATLAS Tile Calorimeter calibration and monitoring systems

    NASA Astrophysics Data System (ADS)

    Cortés-González, Arely

    2018-01-01

    The ATLAS Tile Calorimeter is the central section of the hadronic calorimeter of the ATLAS experiment and provides important information for reconstruction of hadrons, jets, hadronic decays of tau leptons and missing transverse energy. This sampling calorimeter uses steel plates as absorber and scintillating tiles as active medium. The light produced by the passage of charged particles is transmitted by wavelength shifting fibres to photomultiplier tubes, located in the outer part of the calorimeter. Neutral particles may also produce a signal after interacting with the material and producing charged particles. The readout is segmented into about 5000 cells, each of them being read out by two photomultipliers in parallel. To calibrate and monitor the stability and performance of each part of the readout chain during the data taking, a set of calibration systems is used. This comprises Cesium radioactive sources, Laser, charge injection elements and an integrator based readout system. Information from all systems allows to monitor and equalise the calorimeter response at each stage of the signal production, from scintillation light to digitisation. Calibration runs are monitored from a data quality perspective and used as a cross-check for physics runs. The data quality efficiency achieved during 2016 was 98.9%. These calibration and stability of the calorimeter reported here show that the TileCal performance is within the design requirements and has given essential contribution to reconstructed objects and physics results.

  3. Frustrated magnetism in the spin–chain metal Yb 2Fe 12P 7

    DOE PAGES

    Baumbach, Ryan E.; Hamlin, James J.; Janoschek, Marc; ...

    2016-01-08

    Here, magnetization measurements for magnetic fieldsmore » $${{\\mu}_{0}}H$$ up to 60 T are reported for the noncentrosymmetric spin–chain metal Yb 2Fe 12P 7. These measurements reveal behavior that is consistent with Ising-like spin chain magnetism that produces pronounced spin degeneracy. In particular, we find that although a Brillouin field dependence is observed in M( H) for $$H\\bot ~c$$ with a saturation moment that is close to the expected value for free ions of Yb 3+, non-Brillouin-like behavior is seen for $$H~\\parallel ~c$$ with an initial saturation moment that is nearly half the free ion value. In addition, hysteretic behavior that extends above the ordering temperature $${{T}_{\\text{M}}}$$ is seen for $$H~\\parallel ~c$$ but not for $$H~\\bot ~c$$ , suggesting out-of-equilibrium physics. This point of view is strengthened by the observation of a spin reconfiguration in the ordered state for $$H~\\parallel ~c$$ which is only seen for $$T\\leqslant {{T}_{\\text{M}}}$$ and after polarizing the spins. Together with the heat capacity data, these results suggest that the anomalous low temperature phenomena that were previously reported are driven by spin degeneracy that is related to the Ising-like one dimensional chain-like configuration of the Yb ions.« less

  4. The cognitive architecture for chaining of two mental operations.

    PubMed

    Sackur, Jérôme; Dehaene, Stanislas

    2009-05-01

    A simple view, which dates back to Turing, proposes that complex cognitive operations are composed of serially arranged elementary operations, each passing intermediate results to the next. However, whether and how such serial processing is achieved with a brain composed of massively parallel processors, remains an open question. Here, we study the cognitive architecture for chained operations with an elementary arithmetic algorithm: we required participants to add (or subtract) two to a digit, and then compare the result with five. In four experiments, we probed the internal implementation of this task with chronometric analysis, the cued-response method, the priming method, and a subliminal forced-choice procedure. We found evidence for an approximately sequential processing, with an important qualification: the second operation in the algorithm appears to start before completion of the first operation. Furthermore, initially the second operation takes as input the stimulus number rather than the output of the first operation. Thus, operations that should be processed serially are in fact executed partially in parallel. Furthermore, although each elementary operation can proceed subliminally, their chaining does not occur in the absence of conscious perception. Overall, the results suggest that chaining is slow, effortful, imperfect (resulting partly in parallel rather than serial execution) and dependent on conscious control.

  5. Generating unstructured nuclear reactor core meshes in parallel

    DOE PAGES

    Jain, Rajeev; Tautges, Timothy J.

    2014-10-24

    Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less

  6. Roofline model toolkit: A practical tool for architectural and program analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

    We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less

  7. Parallelization of a hydrological model using the message passing interface

    USGS Publications Warehouse

    Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

    2013-01-01

    With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.

  8. Parameter-induced uncertainty quantification of crop yields, soil N2O and CO2 emission for 8 arable sites across Europe using the LandscapeDNDC model

    NASA Astrophysics Data System (ADS)

    Santabarbara, Ignacio; Haas, Edwin; Kraus, David; Herrera, Saul; Klatt, Steffen; Kiese, Ralf

    2014-05-01

    When using biogeochemical models to estimate greenhouse gas emissions at site to regional/national levels, the assessment and quantification of the uncertainties of simulation results are of significant importance. The uncertainties in simulation results of process-based ecosystem models may result from uncertainties of the process parameters that describe the processes of the model, model structure inadequacy as well as uncertainties in the observations. Data for development and testing of uncertainty analisys were corp yield observations, measurements of soil fluxes of nitrous oxide (N2O) and carbon dioxide (CO2) from 8 arable sites across Europe. Using the process-based biogeochemical model LandscapeDNDC for simulating crop yields, N2O and CO2 emissions, our aim is to assess the simulation uncertainty by setting up a Bayesian framework based on Metropolis-Hastings algorithm. Using Gelman statistics convergence criteria and parallel computing techniques, enable multi Markov Chains to run independently in parallel and create a random walk to estimate the joint model parameter distribution. Through means distribution we limit the parameter space, get probabilities of parameter values and find the complex dependencies among them. With this parameter distribution that determines soil-atmosphere C and N exchange, we are able to obtain the parameter-induced uncertainty of simulation results and compare them with the measurements data.

  9. Polymerase chain reaction system

    DOEpatents

    Benett, William J.; Richards, James B.; Stratton, Paul L.; Hadley, Dean R.; Milanovich, Fred P.; Belgrader, Phil; Meyer, Peter L.

    2004-03-02

    A portable polymerase chain reaction DNA amplification and detection system includes one or more chamber modules. Each module supports a duplex assay of a biological sample. Each module has two parallel interrogation ports with a linear optical system. The system is capable of being handheld.

  10. Parallel O(log n) algorithms for open- and closed-chain rigid multibody systems based on a new mass matrix factorization technique

    NASA Technical Reports Server (NTRS)

    Fijany, Amir

    1993-01-01

    In this paper, parallel O(log n) algorithms for computation of rigid multibody dynamics are developed. These parallel algorithms are derived by parallelization of new O(n) algorithms for the problem. The underlying feature of these O(n) algorithms is a drastically different strategy for decomposition of interbody force which leads to a new factorization of the mass matrix (M). Specifically, it is shown that a factorization of the inverse of the mass matrix in the form of the Schur Complement is derived as M(exp -1) = C - B(exp *)A(exp -1)B, wherein matrices C, A, and B are block tridiagonal matrices. The new O(n) algorithm is then derived as a recursive implementation of this factorization of M(exp -1). For the closed-chain systems, similar factorizations and O(n) algorithms for computation of Operational Space Mass Matrix lambda and its inverse lambda(exp -1) are also derived. It is shown that these O(n) algorithms are strictly parallel, that is, they are less efficient than other algorithms for serial computation of the problem. But, to our knowledge, they are the only known algorithms that can be parallelized and that lead to both time- and processor-optimal parallel algorithms for the problem, i.e., parallel O(log n) algorithms with O(n) processors. The developed parallel algorithms, in addition to their theoretical significance, are also practical from an implementation point of view due to their simple architectural requirements.

  11. Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

    PubMed Central

    Wang, Min; Tian, Yun

    2018-01-01

    The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711

  12. Digital tomosynthesis mammography using a parallel maximum-likelihood reconstruction method

    NASA Astrophysics Data System (ADS)

    Wu, Tao; Zhang, Juemin; Moore, Richard; Rafferty, Elizabeth; Kopans, Daniel; Meleis, Waleed; Kaeli, David

    2004-05-01

    A parallel reconstruction method, based on an iterative maximum likelihood (ML) algorithm, is developed to provide fast reconstruction for digital tomosynthesis mammography. Tomosynthesis mammography acquires 11 low-dose projections of a breast by moving an x-ray tube over a 50° angular range. In parallel reconstruction, each projection is divided into multiple segments along the chest-to-nipple direction. Using the 11 projections, segments located at the same distance from the chest wall are combined to compute a partial reconstruction of the total breast volume. The shape of the partial reconstruction forms a thin slab, angled toward the x-ray source at a projection angle 0°. The reconstruction of the total breast volume is obtained by merging the partial reconstructions. The overlap region between neighboring partial reconstructions and neighboring projection segments is utilized to compensate for the incomplete data at the boundary locations present in the partial reconstructions. A serial execution of the reconstruction is compared to a parallel implementation, using clinical data. The serial code was run on a PC with a single PentiumIV 2.2GHz CPU. The parallel implementation was developed using MPI and run on a 64-node Linux cluster using 800MHz Itanium CPUs. The serial reconstruction for a medium-sized breast (5cm thickness, 11cm chest-to-nipple distance) takes 115 minutes, while a parallel implementation takes only 3.5 minutes. The reconstruction time for a larger breast using a serial implementation takes 187 minutes, while a parallel implementation takes 6.5 minutes. No significant differences were observed between the reconstructions produced by the serial and parallel implementations.

  13. A parallel algorithm for step- and chain-growth polymerization in molecular dynamics.

    PubMed

    de Buyl, Pierre; Nies, Erik

    2015-04-07

    Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.

  14. A parallel algorithm for step- and chain-growth polymerization in molecular dynamics

    NASA Astrophysics Data System (ADS)

    de Buyl, Pierre; Nies, Erik

    2015-04-01

    Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.

  15. Three closely related (2E,2'E)-3,3'-(1,4-phenyl-ene)bis-[1-(meth-oxy-phen-yl)prop-2-en-1-ones]: supra-molecular assemblies in one dimension mediated by hydrogen bonding and C-H⋯π inter-actions.

    PubMed

    Sim, Aijia; Chidan Kumar, C S; Kwong, Huey Chong; Then, Li Yee; Win, Yip-Foo; Quah, Ching Kheng; Naveen, S; Chandraju, S; Lokanath, N K; Warad, Ismail

    2017-06-01

    In the title compounds, (2 E ,2' E )-3,3'-(1,4-phenyl-ene)bis-[1-(2-meth-oxy-phen-yl)prop-2-en-1-one], C 26 H 22 O 4 (I), (2 E ,2' E )-3,3'-(1,4-phenyl-ene)bis-[1-(3-meth-oxy-phen-yl)prop-2-en-1-one], C 26 H 22 O 4 (II) and (2 E ,2' E )-3,3'-(1,4-phenyl-ene)bis-[1-(3,4-di-meth-oxy-phen-yl)prop-2-en-1-one], C 28 H 26 O 6 (III), the asymmetric unit consists of a half-mol-ecule, completed by crystallographic inversion symmetry. The dihedral angles between the central and terminal benzene rings are 56.98 (8), 7.74 (7) and 7.73 (7)° for (I), (II) and (III), respectively. In the crystal of (I), mol-ecules are linked by pairs of C-H⋯π inter-actions into chains running parallel to [101]. The packing for (II) and (III), features inversion dimers linked by pairs of C-H⋯O hydrogen bonds, forming R 2 2 (16) and R 2 2 (14) ring motifs, respectively, as parts of [201] and [101] chains, respectively.

  16. Non-volatile memory for checkpoint storage

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blumrich, Matthias A.; Chen, Dong; Cipolla, Thomas M.

    A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, themore » non-volatile memory is a pluggable flash memory card.« less

  17. Multitasking for flows about multiple body configurations using the chimera grid scheme

    NASA Technical Reports Server (NTRS)

    Dougherty, F. C.; Morgan, R. L.

    1987-01-01

    The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.

  18. Parallel and serial grouping of image elements in visual perception.

    PubMed

    Houtkamp, Roos; Roelfsema, Pieter R

    2010-12-01

    The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some situations, but we demonstrate that there are also situations where Gestalt grouping becomes serial. We observe substantial time delays when image elements have to be grouped indirectly through a chain of local groupings. We call this chaining process incremental grouping and demonstrate that it can occur for only a single object at a time. We suggest that incremental grouping requires the gradual spread of object-based attention so that eventually all the object's parts become grouped explicitly by an attentional labeling process. Our findings inspire a new incremental grouping theory that relates the parallel, local grouping process to feedforward processing and the serial, incremental grouping process to recurrent processing in the visual cortex.

  19. Design considerations for parallel graphics libraries

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  20. Density-based parallel skin lesion border detection with webCL

    PubMed Central

    2015-01-01

    Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836

  1. Density-based parallel skin lesion border detection with webCL.

    PubMed

    Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu

    2015-01-01

    Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.

  2. 1. Aerial view of turnpike path running diagonally up from ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    1. Aerial view of turnpike path running diagonally up from lower left (present-day Orange Turnpike alignment) and containing on towards upper right through tree clump in center of the bare spot on the landscape, and on through the trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY

  3. 27 CFR 9.212 - Leona Valley.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... approximately 0.25 mile to its intersection with a trail and the 3,800-foot elevation line, T6N, R13W; then (9... (21) Proceed north and then generally southeast along the 3,600-foot elevation line that runs parallel... elevation line that runs north of the San Andreas Rift Zone to its intersection with the section 16 east...

  4. 27 CFR 9.212 - Leona Valley.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... approximately 0.25 mile to its intersection with a trail and the 3,800-foot elevation line, T6N, R13W; then (9... (21) Proceed north and then generally southeast along the 3,600-foot elevation line that runs parallel... elevation line that runs north of the San Andreas Rift Zone to its intersection with the section 16 east...

  5. TIMEDELN: A programme for the detection and parametrization of overlapping resonances using the time-delay method

    NASA Astrophysics Data System (ADS)

    Little, Duncan A.; Tennyson, Jonathan; Plummer, Martin; Noble, Clifford J.; Sunderland, Andrew G.

    2017-06-01

    TIMEDELN implements the time-delay method of determining resonance parameters from the characteristic Lorentzian form displayed by the largest eigenvalues of the time-delay matrix. TIMEDELN constructs the time-delay matrix from input K-matrices and analyses its eigenvalues. This new version implements multi-resonance fitting and may be run serially or as a high performance parallel code with three levels of parallelism. TIMEDELN takes K-matrices from a scattering calculation, either read from a file or calculated on a dynamically adjusted grid, and calculates the time-delay matrix. This is then diagonalized, with the largest eigenvalue representing the longest time-delay experienced by the scattering particle. A resonance shows up as a characteristic Lorentzian form in the time-delay: the programme searches the time-delay eigenvalues for maxima and traces resonances when they pass through different eigenvalues, separating overlapping resonances. It also performs the fitting of the calculated data to the Lorentzian form and outputs resonance positions and widths. Any remaining overlapping resonances can be fitted jointly. The branching ratios of decay into the open channels can also be found. The programme may be run serially or in parallel with three levels of parallelism. The parallel code modules are abstracted from the main physics code and can be used independently.

  6. Reconfigurable Model Execution in the OpenMDAO Framework

    NASA Technical Reports Server (NTRS)

    Hwang, John T.

    2017-01-01

    NASA's OpenMDAO framework facilitates constructing complex models and computing their derivatives for multidisciplinary design optimization. Decomposing a model into components that follow a prescribed interface enables OpenMDAO to assemble multidisciplinary derivatives from the component derivatives using what amounts to the adjoint method, direct method, chain rule, global sensitivity equations, or any combination thereof, using the MAUD architecture. OpenMDAO also handles the distribution of processors among the disciplines by hierarchically grouping the components, and it automates the data transfer between components that are on different processors. These features have made OpenMDAO useful for applications in aircraft design, satellite design, wind turbine design, and aircraft engine design, among others. This paper presents new algorithms for OpenMDAO that enable reconfigurable model execution. This concept refers to dynamically changing, during execution, one or more of: the variable sizes, solution algorithm, parallel load balancing, or set of variables-i.e., adding and removing components, perhaps to switch to a higher-fidelity sub-model. Any component can reconfigure at any point, even when running in parallel with other components, and the reconfiguration algorithm presented here performs the synchronized updates to all other components that are affected. A reconfigurable software framework for multidisciplinary design optimization enables new adaptive solvers, adaptive parallelization, and new applications such as gradient-based optimization with overset flow solvers and adaptive mesh refinement. Benchmarking results demonstrate the time savings for reconfiguration compared to setting up the model again from scratch, which can be significant in large-scale problems. Additionally, the new reconfigurability feature is applied to a mission profile optimization problem for commercial aircraft where both the parametrization of the mission profile and the time discretization are adaptively refined, resulting in computational savings of roughly 10% and the elimination of oscillations in the optimized altitude profile.

  7. Efficient dynamic simulation for multiple chain robotic mechanisms

    NASA Technical Reports Server (NTRS)

    Lilly, Kathryn W.; Orin, David E.

    1989-01-01

    An efficient O(mN) algorithm for dynamic simulation of simple closed-chain robotic mechanisms is presented, where m is the number of chains, and N is the number of degrees of freedom for each chain. It is based on computation of the operational space inertia matrix (6 x 6) for each chain as seen by the body, load, or object. Also, computation of the chain dynamics, when opened at one end, is required, and the most efficient algorithm is used for this purpose. Parallel implementation of the dynamics for each chain results in an O(N) + O(log sub 2 m+1) algorithm.

  8. An asymptotic induced numerical method for the convection-diffusion-reaction equation

    NASA Technical Reports Server (NTRS)

    Scroggs, Jeffrey S.; Sorensen, Danny C.

    1988-01-01

    A parallel algorithm for the efficient solution of a time dependent reaction convection diffusion equation with small parameter on the diffusion term is presented. The method is based on a domain decomposition that is dictated by singular perturbation analysis. The analysis is used to determine regions where certain reduced equations may be solved in place of the full equation. Parallelism is evident at two levels. Domain decomposition provides parallelism at the highest level, and within each domain there is ample opportunity to exploit parallelism. Run time results demonstrate the viability of the method.

  9. Implementations of BLAST for parallel computers.

    PubMed

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  10. A C++ Thread Package for Concurrent and Parallel Programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jie Chen; William Watson

    1999-11-01

    Recently thread libraries have become a common entity on various operating systems such as Unix, Windows NT and VxWorks. Those thread libraries offer significant performance enhancement by allowing applications to use multiple threads running either concurrently or in parallel on multiprocessors. However, the incompatibilities between native libraries introduces challenges for those who wish to develop portable applications.

  11. drPACS: A Simple UNIX Execution Pipeline

    NASA Astrophysics Data System (ADS)

    Teuben, P.

    2011-07-01

    We describe a very simple yet flexible and effective pipeliner for UNIX commands. It creates a Makefile to define a set of serially dependent commands. The commands in the pipeline share a common set of parameters by which they can communicate. Commands must follow a simple convention to retrieve and store parameters. Pipeline parameters can optionally be made persistent across multiple runs of the pipeline. Tools were added to simplify running a large series of pipelines, which can then also be run in parallel.

  12. Method for resource control in parallel environments using program organization and run-time support

    NASA Technical Reports Server (NTRS)

    Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

    2001-01-01

    A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

  13. Method for resource control in parallel environments using program organization and run-time support

    NASA Technical Reports Server (NTRS)

    Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

    1999-01-01

    A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

  14. Seismic anisotropy and large-scale deformation of the Eastern Alps

    NASA Astrophysics Data System (ADS)

    Bokelmann, Götz; Qorbani, Ehsan; Bianchi, Irene

    2013-12-01

    Mountain chains at the Earth's surface result from deformation processes within the Earth. Such deformation processes can be observed by seismic anisotropy, via the preferred alignment of elastically anisotropic minerals. The Alps show complex deformation at the Earth's surface. In contrast, we show here that observations of seismic anisotropy suggest a relatively simple pattern of internal deformation. Together with earlier observations from the Western Alps, the SKS shear-wave splitting observations presented here show one of the clearest examples yet of mountain chain-parallel fast orientations worldwide, with a simple pattern nearly parallel to the trend of the mountain chain. In the Eastern Alps, the fast orientations do not connect with neighboring mountain chains, neither the present-day Carpathians, nor the present-day Dinarides. In that region, the lithosphere is thin and the observed anisotropy thus resides within the asthenosphere. The deformation is consistent with the eastward extrusion toward the Pannonian basin that was previously suggested based on seismicity and surface geology.

  15. Anisotropic upper critical magnetic fields in Rb 2 Cr 3 As 3 superconductor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Zhang-Tu; Liu, Yi; Bao, Jin-Ke

    Rb2Cr3As3 is a structurally one-dimensional superconductor containing Cr3As3 chains with a superconducting transition temperature of T-c = 4.8 K. Here we report the electrical resistance measurements for Rb2Cr3As3 single crystals, under magnetic fields up to 29.5 T and at temperatures down to 0.36 K, from which the upper critical fields, H-c2(T), can be obtained in a broad temperature range. For field parallel to the Cr3As3 chains, H-c2(parallel to)(T) is paramagnetically limited with an initial slope of mu(0)dH(c2)(parallel to)/dT|T-c = - 16 T K-1 and a zero-temperature upper critical field of mu H-0(c2)parallel to(0) = 17.5 T. For field perpendicular tomore » the Cr3As3 chains, however, H-c2(perpendicular to)(T) is only limited by orbital pair-breaking effect with mu(0)dH(c2)(perpendicular to)/dT|(Tc) = - 3 T K-1. As a consequence, the anisotropy gamma H = H-c2(parallel to)/H-c2(perpendicular to) decreases sharply near T-c and reverses below 2 K. Remarkably, the low- temperature H-c2(perpendicular to)(T) down to 0.075 T-c remains to increase linearly up to over three times the Pauli paramagnetic limit, which strongly suggests dominant spin-triplet superconductivity in Rb2Cr3As3.« less

  16. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  17. Lg Attenuation Anisotropy Across the Western US

    NASA Astrophysics Data System (ADS)

    Phillips, W. S.; Rowe, C. A.; Stead, R. J.; Begnaud, M. L.

    2017-12-01

    The USArray has allowed us to map seismic attenuation of local and regional phases to unprecedented spatial extent and resolution. Following standard mantle Pn velocity anisotropy methods, we have incorporated azimuthal anisotropy into our tomographic inversion of high-frequency Lg amplitudes. The Lg is a crustal shear phase made up of many trapped modes, thus results can be considered to be crustal averages. Azimuthal anisotropy reduces residual variance by just over 10% for 1.5-3 Hz Lg. We observe a median anisotropic variation of 12%, and a high of 50% in the Salton Trough. Low attenuation (high-Q) directions run parallel to topographic fabric and major strike slip faults in tectonically active areas, and often run parallel to mantle shear wave splitting directions in stable regions. Tradeoffs are of concern, and synthetic tests show that elongated attenuation anomalies will produce anisotropy artifacts, but of factors 2-3 times lower than observations. In particular, the strength of a long, narrow high-Q anomaly will trade off with high-Q directions parallel to the long axis, while an elongated low-Q anomaly will trade off with high-Q directions perpendicular to the long axis. We observe an elongated low-Q anomaly associated with the Walker Lane; however, observed high-Q directions run parallel to the long axis of this anomaly, opposite to the tradeoff effect, supporting the anisotropic observation, and implying that the effect may be underestimated. Further, we observe an elongated high-Q anomaly associated with the Great Valley and Sierra Nevada that runs across the long axis, again opposite to the tradeoff effect. This study was performed using waveforms, event locations and phase picks made available by IRIS, NEIC and ANF, and processing was done using semi-automated means, thus this is a technique that can be applied quickly to study crustal anisotropy over large areas when appropriate station density is available.

  18. The Mendeleev Crater chain: A description and discussion of origin

    NASA Technical Reports Server (NTRS)

    Eppler, D.; Heiken, G.

    1974-01-01

    A 113-kilometer-long crater chain on the floor of Mendeleev Crater is the best morphological example of several similar chains on the lunar far side. Age relationships relative to Mendeleev Crater indicate that it is a younger feature that may have developed over a fault parallel to the lunar grid system. The dumbbell shape of the chain may be related to a differential stress along a fault crossing the floor that resulted in varying resistance to magma invasion.

  19. Probing amyloid fibril formation of the NFGAIL peptide by computer simulations

    NASA Astrophysics Data System (ADS)

    Melquiond, Adrien; Gelly, Jean-Christophe; Mousseau, Normand; Derreumaux, Philippe

    2007-02-01

    Amyloid fibril formation, as observed in Alzheimer's disease and type II diabetes, is currently described by a nucleation-condensation mechanism, but the details of the process preceding the formation of the nucleus are still lacking. In this study, using an activation-relaxation technique coupled to a generic energy model, we explore the aggregation pathways of 12 chains of the hexapeptide NFGAIL. The simulations show, starting from a preformed parallel dimer and ten disordered chains, that the peptides form essentially amorphous oligomers or more rarely ordered β-sheet structures where the peptides adopt a parallel orientation within the sheets. Comparison between the simulations indicates that a dimer is not a sufficient seed for avoiding amorphous aggregates and that there is a critical threshold in the number of connections between the chains above which exploration of amorphous aggregates is preferred.

  20. Polymer scaling and dynamics in steady-state sedimentation at infinite Péclet number.

    PubMed

    Lehtola, V; Punkkinen, O; Ala-Nissila, T

    2007-11-01

    We consider the static and dynamical behavior of a flexible polymer chain under steady-state sedimentation using analytic arguments and computer simulations. The model system comprises a single coarse-grained polymer chain of N segments, which resides in a Newtonian fluid as described by the Navier-Stokes equations. The chain is driven into nonequilibrium steady state by gravity acting on each segment. The equations of motion for the segments and the Navier-Stokes equations are solved simultaneously using an immersed boundary method, where thermal fluctuations are neglected. To characterize the chain conformation, we consider its radius of gyration RG(N). We find that the presence of gravity explicitly breaks the spatial symmetry leading to anisotropic scaling of the components of RG with N along the direction of gravity RG, parallel and perpendicular to it RG, perpendicular, respectively. We numerically estimate the corresponding anisotropic scaling exponents nu parallel approximately 0.79 and nu perpendicular approximately 0.45, which differ significantly from the equilibrium scaling exponent nue=0.588 in three dimensions. This indicates that on the average, the chain becomes elongated along the sedimentation direction for large enough N. We present a generalization of the Flory scaling argument, which is in good agreement with the numerical results. It also reveals an explicit dependence of the scaling exponents on the Reynolds number. To study the dynamics of the chain, we compute its effective diffusion coefficient D(N), which does not contain Brownian motion. For the range of values of N used here, we find that both the parallel and perpendicular components of D increase with the chain length N, in contrast to the case of thermal diffusion in equilibrium. This is caused by the fluid-driven fluctuations in the internal configuration of the polymer that are magnified as polymer size becomes larger.

  1. The structure of denisovite, a fibrous nanocrystalline polytypic disordered ‘very complex’ silicate, studied by a synergistic multi-disciplinary approach employing methods of electron crystallography and X-ray powder diffraction

    PubMed Central

    Schowalter, Marco; Schmidt, Martin U.; Czank, Michael; Depmeier, Wulf; Rosenauer, Andreas

    2017-01-01

    Denisovite is a rare mineral occurring as aggregates of fibres typically 200–500 nm diameter. It was confirmed as a new mineral in 1984, but important facts about its chemical formula, lattice parameters, symmetry and structure have remained incompletely known since then. Recently obtained results from studies using microprobe analysis, X-ray powder diffraction (XRPD), electron crystallography, modelling and Rietveld refinement will be reported. The electron crystallography methods include transmission electron microscopy (TEM), selected-area electron diffraction (SAED), high-angle annular dark-field imaging (HAADF), high-resolution transmission electron microscopy (HRTEM), precession electron diffraction (PED) and electron diffraction tomography (EDT). A structural model of denisovite was developed from HAADF images and later completed on the basis of quasi-kinematic EDT data by ab initio structure solution using direct methods and least-squares refinement. The model was confirmed by Rietveld refinement. The lattice parameters are a = 31.024 (1), b = 19.554 (1) and c = 7.1441 (5) Å, β = 95.99 (3)°, V = 4310.1 (5) Å3 and space group P12/a1. The structure consists of three topologically distinct dreier silicate chains, viz. two xonotlite-like dreier double chains, [Si6O17]10−, and a tubular loop-branched dreier triple chain, [Si12O30]12−. The silicate chains occur between three walls of edge-sharing (Ca,Na) octahedra. The chains of silicate tetrahedra and the octahedra walls extend parallel to the z axis and form a layer parallel to (100). Water molecules and K+ cations are located at the centre of the tubular silicate chain. The latter also occupy positions close to the centres of eight-membered rings in the silicate chains. The silicate chains are geometrically constrained by neighbouring octahedra walls and present an ambiguity with respect to their z position along these walls, with displacements between neighbouring layers being either Δz = c/4 or −c/4. Such behaviour is typical for polytypic sequences and leads to disorder along [100]. In fact, the diffraction pattern does not show any sharp reflections with l odd, but continuous diffuse streaks parallel to a* instead. Only reflections with l even are sharp. The diffuse scattering is caused by (100) nano­lamellae separated by stacking faults and twin boundaries. The structure can be described according to the order–disorder (OD) theory as a stacking of layers parallel to (100). PMID:28512570

  2. INTERNAL AMPLIFICATION CONTROL FOR USE IN QUANTITATIVE POLYMERASE CHAIN REACTION FECAL INDICATOR BACTERIA ASSAYS

    EPA Science Inventory

    Quantitative polymerase chain reaction (QPCR) can be used as a rapid method for detecting fecal indicator bacteria. Because false negative results can be caused by PCR inhibitors that co-extract with the DNA samples, an internal amplification control (IAC) should be run with eac...

  3. Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

    NASA Astrophysics Data System (ADS)

    Georgiev, K.; Zlatev, Z.

    2010-11-01

    The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.

  4. Comparison of the crystal structures of methyl 4-bromo-2-(meth-oxy-meth-oxy)benzoate and 4-bromo-3-(meth-oxy-meth-oxy)benzoic acid.

    PubMed

    Suchetan, P A; Suneetha, V; Naveen, S; Lokanath, N K; Krishna Murthy, P

    2016-04-01

    The title compounds, C10H11BrO4, (I), and C9H9BrO4, (II), are derivatives of bromo-hy-droxy-benzoic acids. Compound (II) crystallizes with two independent mol-ecules (A and B) in the asymmetric unit. In both (I) and (II), the O-CH2-O-CH3 side chain is not in its fully extended conformation; the O-C-O-C torsion angle is 67.3 (3) ° in (I), and -65.8 (3) and -74.1 (3)° in mol-ecules A and B, respectively, in compound (II). In the crystal of (I), mol-ecules are linked by C-H⋯O hydrogen bonds, forming C(5) chains along [010]. The chains are linked by short Br⋯O contacts [3.047 (2) Å], forming sheets parallel to the bc plane. The sheets are linked via C-H⋯π inter-actions, forming a three-dimensional architecture. In the crystal of (II), mol-ecules A and B are linked to form R 2 (2)(8) dimers via two strong O-H⋯O hydrogen bonds. These dimers are linked into ⋯A-B⋯A-B⋯A-B⋯ [C 2 (2)(15)] chains along [011] by C-H⋯O hydrogen bonds. The chains are linked by slipped parallel π-π inter-actions [inter-centroid distances = 3.6787 (18) and 3.8431 (17) Å], leading to the formation of slabs parallel to the bc plane.

  5. Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; Jong, Wibe de

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less

  6. Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; de Jong, Wibe

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less

  7. Near-ridge seamount chains in the northeastern Pacific Ocean

    NASA Astrophysics Data System (ADS)

    Clague, David A.; Reynolds, Jennifer R.; Davis, Alicé S.

    2000-07-01

    High-resolution bathymetry and side-scan data of the Vance, President Jackson, and Taney near-ridge seamount chains in the northeast Pacific were collected with a hull-mounted 30-kHz sonar. The central volcanoes in each chain consist of truncated cone-shaped volcanoes with steep sides and nearly flat tops. Several areas are characterized by frequent small eruptions that result in disorganized volcanic regions with numerous small cones and volcanic ridges but no organized truncated conical structure. Several volcanoes are crosscut by ridge-parallel faults, showing that they formed within 30-40 km of the ridge axis where ridge-parallel faulting is still active. Magmas that built the volcanoes were probably transported through the crust along active ridge-parallel faults. The volcanoes range in volume from 11 to 187 km3, and most have one or more multiple craters and calderas that modify their summits and flanks. The craters (<1 km diameter) and calderas (>1 km diameter) range from small pit craters to calderas as large as 6.5×8.5 km, although most are 2-4 km across. Crosscutting relationships commonly show a sequence of calderas stepping toward the ridge axis. The calderas overlie crustal magma chambers at least as large as those that underlie Kilauea and Mauna Loa Volcanoes in Hawaii, perhaps 4-5 km in diameter and ˜1-3 km below the surface. The nearly flat tops of many of the volcanoes have remnants of centrally located summit shields, suggesting that their flat tops did not form from eruptions along circumferential ring faults but instead form by filling and overflowing of earlier large calderas. The lavas retain their primitive character by residing in such chambers for only short time periods prior to eruption. Stored magmas are withdrawn, probably as dikes intruded into the adjacent ocean crust along active ridge-parallel faults, triggering caldera collapse, or solidified before the next batch of magma is intruded into the volcano, probably 1000-10,000 years later. The chains are oriented parallel to subaxial asthenospheric flow rather than absolute or relative plate motion vectors. The subaxial asthenospheric flow model yields rates of volcanic migration of 3.4, 3.3 and 5.9 cm yr-1 for the Vance, President Jackson, and Taney Seamounts, respectively. The modeled lifespans of the individual volcanoes in the three chains vary from 75 to 95 kyr. These lifespans, coupled with the geologic observations based on the bathymetry, allow us to construct models of magma supply through time for the volcanoes in the three chains.

  8. Synthesis, structure, and physicochemical investigations of the new α Cu 0.50TiO(PO 4) oxyphosphate

    NASA Astrophysics Data System (ADS)

    Benmokhtar, S.; Belmal, H.; El Jazouli, A.; Chaminade, J. P.; Gravereau, P.; Pechev, S.; Grenier, J. C.; Villeneuve, G.; de Waal, D.

    2007-02-01

    The room-temperature crystal structure of a new Cu(II) oxyphosphate— α Cu 0.50IITiO(PO 4)—was determined from X-ray single crystals diffraction data, in the monoclinic system, space group P2 1/c. The refinement from 5561 independent reflections lead to the following parameters: a=7.5612(4)Å, b=7.0919(4)Å, c=7.4874(4)Å, β=122.25(1)°, Z=4, with the final R=0.0198, wR=0.0510. The structure of α Cu 0.50IITiO(PO 4) can be described as a TiOPO 4 framework constituted by chains of tilted corner-sharing [TiO 6] octahedra running parallel to the c-axis and cross linked by phosphate [PO 4] tetrahedra, where one-half of octahedral cavities created are occupied by Cu atoms. Ti atoms are displaced from the center of octahedra units in alternating long (2.308 Å) and short (1.722 Å) Ti-O(1) bonds along chains. Such O(1) atoms not linked to P atoms justify the oxyphosphate formulation α Cu 0.50TiO(PO 4). The divalent cations Cu 2+ occupy a Jahn-Teller distorted octahedron sharing two faces with two [TiO 6] octahedra. EPR and optical measurements are in good agreement with structural data. The X-ray diffraction results are supported by Raman and infrared spectroscopy studies that confirmed the existence of the infinite chains -Ti-O-Ti-O-Ti-. α Cu 0.50TiO(PO 4) shows a Curie-Weiss paramagnetic behavior in the temperature range 4-80 K.

  9. A one-dimensional zinc(II) coordination polymer with a three-dimensional supramolecular architecture incorporating 1-[(1H-benzimidazol-2-yl)methyl]-1H-tetrazole and adipate.

    PubMed

    Liu, Chun Li; Huang, Qiu Ying; Meng, Xiang Ru

    2016-12-01

    The synthesis of coordination polymers or metal-organic frameworks (MOFs) has attracted considerable interest owing to the interesting structures and potential applications of these compounds. It is still a challenge to predict the exact structures and compositions of the final products. A new one-dimensional coordination polymer, catena-poly[[[bis{1-[(1H-benzimidazol-2-yl)methyl]-1H-tetrazole-κN 3 }zinc(II)]-μ-hexane-1,6-dicarboxylato-κ 4 O 1 ,O 1' :O 6 ,O 6' ] monohydrate], {[Zn(C 6 H 8 O 4 )(C 9 H 8 N 6 ) 2 ]·H 2 O} n , has been synthesized by the reaction of Zn(Ac) 2 (Ac is acetate) with 1-[(1H-benzimidazol-2-yl)methyl]-1H-tetrazole (bimt) and adipic acid (H 2 adi) at room temperature. In the polymer, each Zn II ion exhibits an irregular octahedral ZnN 2 O 4 coordination geometry and is coordinated by two N atoms from two symmetry-related bimt ligands and four O atoms from two symmetry-related dianionic adipate ligands. Zn II ions are connected by adipate ligands into a one-dimensional chain which runs parallel to the c axis. The bimt ligands coordinate to the Zn II ions in a monodentate mode on both sides of the main chain. In the crystal, the one-dimensional chains are further connected through N-H...O hydrogen bonds, leading to a three-dimensional supramolecular architecture. In addition, the title polymer exhibits fluorescence, with emissions at 334 and 350 nm in the solid state at room temperature.

  10. VAC: Versatile Advection Code

    NASA Astrophysics Data System (ADS)

    Tóth, Gábor; Keppens, Rony

    2012-07-01

    The Versatile Advection Code (VAC) is a freely available general hydrodynamic and magnetohydrodynamic simulation software that works in 1, 2 or 3 dimensions on Cartesian and logically Cartesian grids. VAC runs on any Unix/Linux system with a Fortran 90 (or 77) compiler and Perl interpreter. VAC can run on parallel machines using either the Message Passing Interface (MPI) library or a High Performance Fortran (HPF) compiler.

  11. Decision tables and rule engines in organ allocation systems for optimal transparency and flexibility.

    PubMed

    Schaafsma, Murk; van der Deijl, Wilfred; Smits, Jacqueline M; Rahmel, Axel O; de Vries Robbé, Pieter F; Hoitsma, Andries J

    2011-05-01

    Organ allocation systems have become complex and difficult to comprehend. We introduced decision tables to specify the rules of allocation systems for different organs. A rule engine with decision tables as input was tested for the Kidney Allocation System (ETKAS). We compared this rule engine with the currently used ETKAS by running 11,000 historical match runs and by running the rule engine in parallel with the ETKAS on our allocation system. Decision tables were easy to implement and successful in verifying correctness, completeness, and consistency. The outcomes of the 11,000 historical matches in the rule engine and the ETKAS were exactly the same. Running the rule engine simultaneously in parallel and in real time with the ETKAS also produced no differences. Specifying organ allocation rules in decision tables is already a great step forward in enhancing the clarity of the systems. Yet, using these tables as rule engine input for matches optimizes the flexibility, simplicity and clarity of the whole process, from specification to the performed matches, and in addition this new method allows well controlled simulations. © 2011 The Authors. Transplant International © 2011 European Society for Organ Transplantation.

  12. Streaming data analytics via message passing with application to graph algorithms

    DOE PAGES

    Plimpton, Steven J.; Shead, Tim

    2014-05-06

    The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less

  13. Compressed quantum computation using a remote five-qubit quantum computer

    NASA Astrophysics Data System (ADS)

    Hebenstreit, M.; Alsina, D.; Latorre, J. I.; Kraus, B.

    2017-05-01

    The notion of compressed quantum computation is employed to simulate the Ising interaction of a one-dimensional chain consisting of n qubits using the universal IBM cloud quantum computer running on log2(n ) qubits. The external field parameter that controls the quantum phase transition of this model translates into particular settings of the quantum gates that generate the circuit. We measure the magnetization, which displays the quantum phase transition, on a two-qubit system, which simulates a four-qubit Ising chain, and show its agreement with the theoretical prediction within a certain error. We also discuss the relevant point of how to assess errors when using a cloud quantum computer with a limited amount of runs. As a solution, we propose to use validating circuits, that is, to run independent controlled quantum circuits of similar complexity to the circuit of interest.

  14. Gorgonum Chaos

    NASA Technical Reports Server (NTRS)

    2002-01-01

    (Released 08 April 2002) This image shows the cratered highlands of Terra Sirenum in the southern hemisphere. Near the center of the image running from left to right one can see long parallel to semi-parallel fractures or troughs called graben. Mars Global Surveyor initially discovered gullies on the south-facing wall of these fractures. This image is located at 38oS, 174oW (186oE).

  15. Long-range interactions and parallel scalability in molecular simulations

    NASA Astrophysics Data System (ADS)

    Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

    2007-01-01

    Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

  16. Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.

  17. Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.

  18. Parallel design of JPEG-LS encoder on graphics processing units

    NASA Astrophysics Data System (ADS)

    Duan, Hao; Fang, Yong; Huang, Bormin

    2012-01-01

    With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.

  19. Characterizing parallel file-access patterns on a large-scale multiprocessor

    NASA Technical Reports Server (NTRS)

    Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.

    1995-01-01

    High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.

  20. 50 GFlops molecular dynamics on the Connection Machine 5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lomdahl, P.S.; Tamayo, P.; Groenbech-Jensen, N.

    1993-12-31

    The authors present timings and performance numbers for a new short range three dimensional (3D) molecular dynamics (MD) code, SPaSM, on the Connection Machine-5 (CM-5). They demonstrate that runs with more than 10{sup 8} particles are now possible on massively parallel MIMD computers. To the best of their knowledge this is at least an order of magnitude more particles than what has previously been reported. Typical production runs show sustained performance (including communication) in the range of 47--50 GFlops on a 1024 node CM-5 with vector units (VUs). The speed of the code scales linearly with the number of processorsmore » and with the number of particles and shows 95% parallel efficiency in the speedup.« less

  1. Implementation of the force decomposition machine for molecular dynamics simulations.

    PubMed

    Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

    2012-09-01

    We present the design and implementation of the force decomposition machine (FDM), a cluster of personal computers (PCs) that is tailored to running molecular dynamics (MD) simulations using the distributed diagonal force decomposition (DDFD) parallelization method. The cluster interconnect architecture is optimized for the communication pattern of the DDFD method. Our implementation of the FDM relies on standard commodity components even for networking. Although the cluster is meant for DDFD MD simulations, it remains general enough for other parallel computations. An analysis of several MD simulation runs on both the FDM and a standard PC cluster demonstrates that the FDM's interconnect architecture provides a greater performance compared to a more general cluster interconnect. Copyright © 2012 Elsevier Inc. All rights reserved.

  2. Co-Operative Schools: A Democratic Alternative

    ERIC Educational Resources Information Center

    Audsley, Jamie; Cook, Philip

    2012-01-01

    Many fear that the pressures of running an Academy will be too great for individual schools, and that they will be forced to join chains run by private companies. These may offer hard-pressed school administrators valuable management expertise and back-office support, but seem to offer wider society little accountability and transparency. Are…

  3. Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

    USGS Publications Warehouse

    Laura, Jason R.; Rey, Sergio J.

    2017-01-01

    Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.

  4. Grace: A cross-platform micromagnetic simulator on graphics processing units

    NASA Astrophysics Data System (ADS)

    Zhu, Ru

    2015-12-01

    A micromagnetic simulator running on graphics processing units (GPUs) is presented. Different from GPU implementations of other research groups which are predominantly running on NVidia's CUDA platform, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and is hardware platform independent. It runs on GPUs from venders including NVidia, AMD and Intel, and achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude. The simulator paved the way for running large size micromagnetic simulations on both high-end workstations with dedicated graphics cards and low-end personal computers with integrated graphics cards, and is freely available to download.

  5. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    NASA Astrophysics Data System (ADS)

    Moon, Hongsik

    What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.

  6. Communications oriented programming of parallel iterative solutions of sparse linear systems

    NASA Technical Reports Server (NTRS)

    Patrick, M. L.; Pratt, T. W.

    1986-01-01

    Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.

  7. A hydrodynamic mechanism for spontaneous formation of ordered drop arrays in confined shear flow

    NASA Astrophysics Data System (ADS)

    Singha, Sagnik; Zurita-Gotor, Mauricio; Loewenberg, Michael; Migler, Kalman; Blawzdziewicz, Jerzy

    2017-11-01

    It has been experimentally demonstrated that a drop monolayer driven by a confined shear flow in a Couette device can spontaneously arrange into a flow-oriented parallel chain microstructure. However, the hydrodynamic mechanism of this puzzling self-assembly phenomenon has so far eluded explanation. In a recent publication we suggested that the observed spontaneous drop ordering may arise from hydrodynamic interparticle interactions via a far-field quadrupolar Hele-Shaw flow associated with drop deformation. To verify this conjecture we have developed a simple numerical-simulation model that includes the far-field Hele-Shaw flow quadrupoles and a near-field short-range repulsion. Our simulations show that an initially disordered particle configuration self-organizes into a system of particle chains, similar to the experimentally observed drop-chain structures. The initial stage of chain formation is fast; subsequently, microstructural defects in a partially ordered system are removed by slow annealing, leading to an array of equally spaced parallel chains with a small number of defects. The microstructure evolution is analyzed using angular and spatial order parameters and correlation functions. Supported by NSF Grants No. CBET 1603627 and CBET 1603806.

  8. Node Resource Manager: A Distributed Computing Software Framework Used for Solving Geophysical Problems

    NASA Astrophysics Data System (ADS)

    Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.

    2011-12-01

    With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  9. Distributed run of a one-dimensional model in a regional application using SOAP-based web services

    NASA Astrophysics Data System (ADS)

    Smiatek, Gerhard

    This article describes the setup of a distributed computing system in Perl. It facilitates the parallel run of a one-dimensional environmental model on a number of simple network PC hosts. The system uses Simple Object Access Protocol (SOAP) driven web services offering the model run on remote hosts and a multi-thread environment distributing the work and accessing the web services. Its application is demonstrated in a regional run of a process-oriented biogenic emission model for the area of Germany. Within a network consisting of up to seven web services implemented on Linux and MS-Windows hosts, a performance increase of approximately 400% has been reached compared to a model run on the fastest single host.

  10. Scalable and balanced dynamic hybrid data assimilation

    NASA Astrophysics Data System (ADS)

    Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa

    2017-04-01

    Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.

  11. A reversible-jump Markov chain Monte Carlo algorithm for 1D inversion of magnetotelluric data

    NASA Astrophysics Data System (ADS)

    Mandolesi, Eric; Ogaya, Xenia; Campanyà, Joan; Piana Agostinetti, Nicola

    2018-04-01

    This paper presents a new computer code developed to solve the 1D magnetotelluric (MT) inverse problem using a Bayesian trans-dimensional Markov chain Monte Carlo algorithm. MT data are sensitive to the depth-distribution of rock electric conductivity (or its reciprocal, resistivity). The solution provided is a probability distribution - the so-called posterior probability distribution (PPD) for the conductivity at depth, together with the PPD of the interface depths. The PPD is sampled via a reversible-jump Markov Chain Monte Carlo (rjMcMC) algorithm, using a modified Metropolis-Hastings (MH) rule to accept or discard candidate models along the chains. As the optimal parameterization for the inversion process is generally unknown a trans-dimensional approach is used to allow the dataset itself to indicate the most probable number of parameters needed to sample the PPD. The algorithm is tested against two simulated datasets and a set of MT data acquired in the Clare Basin (County Clare, Ireland). For the simulated datasets the correct number of conductive layers at depth and the associated electrical conductivity values is retrieved, together with reasonable estimates of the uncertainties on the investigated parameters. Results from the inversion of field measurements are compared with results obtained using a deterministic method and with well-log data from a nearby borehole. The PPD is in good agreement with the well-log data, showing as a main structure a high conductive layer associated with the Clare Shale formation. In this study, we demonstrate that our new code go beyond algorithms developend using a linear inversion scheme, as it can be used: (1) to by-pass the subjective choices in the 1D parameterizations, i.e. the number of horizontal layers in the 1D parameterization, and (2) to estimate realistic uncertainties on the retrieved parameters. The algorithm is implemented using a simple MPI approach, where independent chains run on isolated CPU, to take full advantage of parallel computer architectures. In case of a large number of data, a master/slave appoach can be used, where the master CPU samples the parameter space and the slave CPUs compute forward solutions.

  12. Implementing Shared Memory Parallelism in MCBEND

    NASA Astrophysics Data System (ADS)

    Bird, Adam; Long, David; Dobson, Geoff

    2017-09-01

    MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.

  13. Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

    PubMed

    Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

    2014-10-30

    Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.

  14. New NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

    1997-01-01

    NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

  15. Porting LAMMPS to GPUs.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, William Michael; Plimpton, Steven James; Wang, Peng

    2010-03-01

    LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.

  16. FPGA-based protein sequence alignment : A review

    NASA Astrophysics Data System (ADS)

    Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi

    2017-11-01

    Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.

  17. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    NASA Technical Reports Server (NTRS)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  18. Report to the High Order Language Working Group (HOLWG)

    DTIC Science & Technology

    1977-01-14

    as running, runnable, suspended or dormant, may be synchronized by semaphore variables, may be schedaled using clock and duration data types and mpy...Recursive and non-recursive routines G6. Parallel processes, synchronization , critical regions G7. User defined parameterized exception handling G8...typed and lacks extensibility, parallel processing, synchronization and real-time features. Overall Evaluation IBM strongly recommended PL/I as a

  19. Evaluating SPLASH-2 Applications Using MapReduce

    NASA Astrophysics Data System (ADS)

    Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu

    MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.

  20. Automatic Adaptation of Tunable Distributed Applications

    DTIC Science & Technology

    2001-01-01

    size, weight, and battery life, with a single CPU, less memory, smaller hard disk, and lower bandwidth network connectivity. The power of PDAs is...wireless, and bluetooth [32] facilities; thus achieving different rates of data transmission. 1 With the trend of “write once, run everywhere...applications, a single component can execute on multiple processors (or machines) in parallel. These parallel applications, written in a specialized language

  1. Simulation framework for intelligent transportation systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ewing, T.; Doss, E.; Hanebutte, U.

    1996-10-01

    A simulation framework has been developed for a large-scale, comprehensive, scaleable simulation of an Intelligent Transportation System (ITS). The simulator is designed for running on parallel computers and distributed (networked) computer systems, but can run on standalone workstations for smaller simulations. The simulator currently models instrumented smart vehicles with in-vehicle navigation units capable of optimal route planning and Traffic Management Centers (TMC). The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide two-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphicalmore » user interfaces to support human-factors studies. Realistic modeling of variations of the posted driving speed are based on human factors studies that take into consideration weather, road conditions, driver personality and behavior, and vehicle type. The prototype has been developed on a distributed system of networked UNIX computers but is designed to run on parallel computers, such as ANL`s IBM SP-2, for large-scale problems. A novel feature of the approach is that vehicles are represented by autonomous computer processes which exchange messages with other processes. The vehicles have a behavior model which governs route selection and driving behavior, and can react to external traffic events much like real vehicles. With this approach, the simulation is scaleable to take advantage of emerging massively parallel processor (MPP) systems.« less

  2. AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

    NASA Astrophysics Data System (ADS)

    Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

    2017-05-01

    We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.

  3. Setting Standards for Medically-Based Running Analysis

    PubMed Central

    Vincent, Heather K.; Herman, Daniel C.; Lear-Barnes, Leslie; Barnes, Robert; Chen, Cong; Greenberg, Scott; Vincent, Kevin R.

    2015-01-01

    Setting standards for medically based running analyses is necessary to ensure that runners receive a high-quality service from practitioners. Medical and training history, physical and functional tests, and motion analysis of running at self-selected and faster speeds are key features of a comprehensive analysis. Self-reported history and movement symmetry are critical factors that require follow-up therapy or long-term management. Pain or injury is typically the result of a functional deficit above or below the site along the kinematic chain. PMID:25014394

  4. The Epistemological Chain: Practical Applications in Sports

    ERIC Educational Resources Information Center

    Grecic, David; Collins, Dave

    2013-01-01

    This article highlights the role of personal epistemology in decision-making and proposes the construct of an epistemological chain (EC) to support this process in the domain of sports coaching. First, the EC is outlined using examples from education and other parallel disciplines. What it looks like to sports coaches is then described, and its…

  5. Genetic Parallel Programming: design and implementation.

    PubMed

    Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

    2006-01-01

    This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

  6. Parallel-In-Time For Moving Meshes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Falgout, R. D.; Manteuffel, T. A.; Southworth, B.

    2016-02-04

    With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less

  7. Sharp organic interface of molecular C60 chains and a pentacene derivative SAM on Au(788): A combined STM & DFT study

    NASA Astrophysics Data System (ADS)

    Wang, Jun; Tang, Jian-Ming; Larson, Amanda M.; Miller, Glen P.; Pohl, Karsten

    2013-12-01

    Controlling the molecular structure of the donor-acceptor interface is essential to overcoming the efficiency bottleneck in organic photovoltaics. We present a study of self-assembled fullerene (C60) molecular chains on perfectly ordered 6,13-dichloropentacene (DCP) monolayers forming on a vicinal Au(788) surface using scanning tunneling microscopy in conjunction with density functional theory calculations. DCP is a novel pentacene derivative optimized for photovoltaic applications. The molecules form a brick-wall patterned centered rectangular lattice with the long axis parallel to the monatomic steps that separate the 3.9 nm wide Au(111) terraces. The strong interaction between the C60 molecules and the gold substrate is well screened by the DCP monolayer. At submonolayer C60 coverage, the fullerene molecules form long parallel chains, 1.1 nm apart, with a rectangular arrangement instead of the expected close-packed configuration along the upper step edges. The perfectly ordered DCP structure is unaffected by the C60 chain formation. The controlled sharp highly-ordered organic interface has the potential to improve the conversion efficiency in organic photovoltaics.

  8. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  9. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  10. Self-assembling semiconducting polymers--rods and gels from electronic materials.

    PubMed

    Clark, Andrew P-Z; Shi, Chenjun; Ng, Benny C; Wilking, James N; Ayzner, Alexander L; Stieg, Adam Z; Schwartz, Benjamin J; Mason, Thomas G; Rubin, Yves; Tolbert, Sarah H

    2013-02-26

    In an effort to favor the formation of straight polymer chains without crystalline grain boundaries, we have synthesized an amphiphilic conjugated polyelectrolyte, poly(fluorene-alt-thiophene) (PFT), which self-assembles in aqueous solutions to form cylindrical micelles. In contrast to many diblock copolymer assemblies, the semiconducting backbone runs parallel, not perpendicular, to the long axis of the cylindrical micelle. Solution-phase micelle formation is observed by X-ray and visible light scattering. The micelles can be cast as thin films, and the cylindrical morphology is preserved in the solid state. The effects of self-assembly are also observed through spectral shifts in optical absorption and photoluminescence. Solutions of higher-molecular-weight PFT micelles form gel networks at sufficiently high aqueous concentrations. Rheological characterization of the PFT gels reveals solid-like behavior and strain hardening below the yield point, properties similar to those found in entangled gels formed from surfactant-based micelles. Finally, electrical measurements on diode test structures indicate that, despite a complete lack of crystallinity in these self-assembled polymers, they effectively conduct electricity.

  11. Epithelial innervation of human cornea: a three-dimensional study using confocal laser scanning fluorescence microscopy.

    PubMed

    Guthoff, Rudolf F; Wienss, Holger; Hahnel, Christian; Wree, Andreas

    2005-07-01

    Evaluation of a new method to visualize distribution and morphology of human corneal nerves (Adelta- and C-fibers) by means of fluorescence staining, confocal laser scanning microscopy, and 3-dimensional (3D) reconstruction. Trephinates of corneas with a diagnosis of Fuchs corneal dystrophy were sliced into layers of 200 microm thickness using a Draeger microkeratome (Storz, Germany). The anterior lamella was stained with the Life/Dead-Kit (Molecular Probes Inc.), examined by the confocal laser scanning microscope "Odyssey XL," step size between 0.5 and 1 microm, and optical sections were digitally 3D-reconstructed. Immediate staining of explanted corneas by the Life/Dead-Kit gave a complete picture of the nerves in the central human cornea. Thin nerves running parallel to the Bowman layer in the subepithelial plexus perforate the Bowman layer orthogonally through tube-like structures. Passing the Bowman layer, Adelta- and C-fibers can be clearly distinguished by fiber diameter, and, while running in the basal epithelial plexus, by their spatial arrangement. Adelta-fibers run straight and parallel to the Bowman layer underneath the basal cell layer. C-fibers, after a short run parallel to the Bowman layer, send off multiple branches penetrating epithelial cell layers orthogonally, ending blindly in invaginations of the superficial cells. In contrast to C-fibers, Adelta-fibers show characteristic bulbous formations when kinking into the basal epithelial plexus. Ex-vivo fluorescence staining of the cornea and 3D reconstructions of confocal scans provide a fast and easily reproducible tool to visualize nerves of the anterior living cornea at high resolution. This may help to clarify gross variations of nerve fiber patterns under various clinical and experimental conditions.

  12. Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hasenkamp, Daren; Sim, Alexander; Wehner, Michael

    Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, whilemore » we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.« less

  13. SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

    NASA Technical Reports Server (NTRS)

    Cooke, Daniel; Rushton, Nelson

    2013-01-01

    With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.

  14. The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

    1997-01-01

    Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.

  15. Image sensor with high dynamic range linear output

    NASA Technical Reports Server (NTRS)

    Yadid-Pecht, Orly (Inventor); Fossum, Eric R. (Inventor)

    2007-01-01

    Designs and operational methods to increase the dynamic range of image sensors and APS devices in particular by achieving more than one integration times for each pixel thereof. An APS system with more than one column-parallel signal chains for readout are described for maintaining a high frame rate in readout. Each active pixel is sampled for multiple times during a single frame readout, thus resulting in multiple integration times. The operation methods can also be used to obtain multiple integration times for each pixel with an APS design having a single column-parallel signal chain for readout. Furthermore, analog-to-digital conversion of high speed and high resolution can be implemented.

  16. 26 CFR 1.958-1 - Direct and indirect ownership of stock.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... treated as actually owned by such person. Thus, this rule creates a chain of ownership; however, since the... United States person in the chain of ownership running from the foreign entity. The application of this... Corporation. Example 4. Among the assets of foreign estate W are Blackacre and a block of stock, consisting of...

  17. 26 CFR 1.958-1 - Direct and indirect ownership of stock.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... treated as actually owned by such person. Thus, this rule creates a chain of ownership; however, since the... United States person in the chain of ownership running from the foreign entity. The application of this... Corporation. Example 4. Among the assets of foreign estate W are Blackacre and a block of stock, consisting of...

  18. 26 CFR 1.958-1 - Direct and indirect ownership of stock.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... treated as actually owned by such person. Thus, this rule creates a chain of ownership; however, since the... United States person in the chain of ownership running from the foreign entity. The application of this... Corporation. Example 4. Among the assets of foreign estate W are Blackacre and a block of stock, consisting of...

  19. 26 CFR 1.958-1 - Direct and indirect ownership of stock.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... treated as actually owned by such person. Thus, this rule creates a chain of ownership; however, since the... United States person in the chain of ownership running from the foreign entity. The application of this... Corporation. Example 4. Among the assets of foreign estate W are Blackacre and a block of stock, consisting of...

  20. 26 CFR 1.958-1 - Direct and indirect ownership of stock.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... treated as actually owned by such person. Thus, this rule creates a chain of ownership; however, since the... United States person in the chain of ownership running from the foreign entity. The application of this... Corporation. Example 4. Among the assets of foreign estate W are Blackacre and a block of stock, consisting of...

  1. Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacob, A; Gokhale, M

    N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.

  2. A Parallel Saturation Algorithm on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Ezekiel, Jonathan; Siminiceanu

    2007-01-01

    Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.

  3. Sierra Structural Dynamics User's Notes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reese, Garth M.

    2015-10-19

    Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.

  4. Sierra/SD User's Notes.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Munday, Lynn Brendon; Day, David M.; Bunting, Gregory

    Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.

  5. LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments

    DTIC Science & Technology

    2015-11-20

    1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming

  6. Advanced Numerical Techniques of Performance Evaluation. Volume 1

    DTIC Science & Technology

    1990-06-01

    system scheduling3thread. The scheduling thread then runs any other ready thread that can be found. A thread can only sleep or switch out on itself...Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers C...Kuck 1987] C.D. Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Comp

  7. StrAuto: automation and parallelization of STRUCTURE analysis.

    PubMed

    Chhatre, Vikram E; Emerson, Kevin J

    2017-03-24

    Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .

  8. 29 CFR 570.65 - Occupations involved in the operations of circular saws, band saws, and guillotine shears (Order...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ..., running over wheels or pulleys, and used for sawing materials. (6) The term guillotine shear shall mean a machine equipped with a movable blade operated vertically and used to shear materials. The term shall not... series of notches or teeth, running over wheels or pulleys, and used for sawing materials. Chain saw...

  9. A Concept for Run-Time Support of the Chapel Language

    NASA Technical Reports Server (NTRS)

    James, Mark

    2006-01-01

    A document presents a concept for run-time implementation of other concepts embodied in the Chapel programming language. (Now undergoing development, Chapel is intended to become a standard language for parallel computing that would surpass older such languages in both computational performance in the efficiency with which pre-existing code can be reused and new code written.) The aforementioned other concepts are those of distributions, domains, allocations, and access, as defined in a separate document called "A Semantic Framework for Domains and Distributions in Chapel" and linked to a language specification defined in another separate document called "Chapel Specification 0.3." The concept presented in the instant report is recognition that a data domain that was invented for Chapel offers a novel approach to distributing and processing data in a massively parallel environment. The concept is offered as a starting point for development of working descriptions of functions and data structures that would be necessary to implement interfaces to a compiler for transforming the aforementioned other concepts from their representations in Chapel source code to their run-time implementations.

  10. User's and test case manual for FEMATS

    NASA Technical Reports Server (NTRS)

    Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John

    1995-01-01

    The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.

  11. CudaChain: an alternative algorithm for finding 2D convex hulls on the GPU.

    PubMed

    Mei, Gang

    2016-01-01

    This paper presents an alternative GPU-accelerated convex hull algorithm and a novel S orting-based P reprocessing A pproach (SPA) for planar point sets. The proposed convex hull algorithm termed as CudaChain consists of two stages: (1) two rounds of preprocessing performed on the GPU and (2) the finalization of calculating the expected convex hull on the CPU. Those interior points locating inside a quadrilateral formed by four extreme points are first discarded, and then the remaining points are distributed into several (typically four) sub regions. For each subset of points, they are first sorted in parallel; then the second round of discarding is performed using SPA; and finally a simple chain is formed for the current remaining points. A simple polygon can be easily generated by directly connecting all the chains in sub regions. The expected convex hull of the input points can be finally obtained by calculating the convex hull of the simple polygon. The library Thrust is utilized to realize the parallel sorting, reduction, and partitioning for better efficiency and simplicity. Experimental results show that: (1) SPA can very effectively detect and discard the interior points; and (2) CudaChain achieves 5×-6× speedups over the famous Qhull implementation for 20M points.

  12. Message Passing on GPUs

    NASA Astrophysics Data System (ADS)

    Stuart, J. A.

    2011-12-01

    This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors, and more specifically GPUs. As a case study, we design and implement the ``DCGN'' API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU-based MPI implementations while providing fully-dynamic communication.

  13. Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

    NASA Astrophysics Data System (ADS)

    Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

    2018-01-01

    The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.

  14. The Acquisition of Pronouns by French Children: A Parallel Study of Production and Comprehension

    ERIC Educational Resources Information Center

    Zesiger, Pascal; Zesiger, Laurence Chillier; Arabatzi, Marina; Baranzini, Lara; Cronel-Ohayon, Stephany; Franck, Julie; Frauenfelder, Ulrich Hans; Hamann, Cornelia; Rizzi, Luigi

    2010-01-01

    This study examines syntactic and morphological aspects of the production and comprehension of pronouns by 99 typically developing French-speaking children aged 3 years, 5 months to 6 years, 5 months. A fine structural analysis of subject, object, and reflexive clitics suggests that whereas the object clitic chain crosses the subject chain, the…

  15. Partitioning in parallel processing of production systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less

  16. Parallel transformation of K-SVD solar image denoising algorithm

    NASA Astrophysics Data System (ADS)

    Liang, Youwen; Tian, Yu; Li, Mei

    2017-02-01

    The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.

  17. Parallelization of the FLAPW method

    NASA Astrophysics Data System (ADS)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  18. Parallel programming with Easy Java Simulations

    NASA Astrophysics Data System (ADS)

    Esquembre, F.; Christian, W.; Belloni, M.

    2018-01-01

    Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

  19. Parallel evolution of image processing tools for multispectral imagery

    NASA Astrophysics Data System (ADS)

    Harvey, Neal R.; Brumby, Steven P.; Perkins, Simon J.; Porter, Reid B.; Theiler, James P.; Young, Aaron C.; Szymanski, John J.; Bloch, Jeffrey J.

    2000-11-01

    We describe the implementation and performance of a parallel, hybrid evolutionary-algorithm-based system, which optimizes image processing tools for feature-finding tasks in multi-spectral imagery (MSI) data sets. Our system uses an integrated spatio-spectral approach and is capable of combining suitably-registered data from different sensors. We investigate the speed-up obtained by parallelization of the evolutionary process via multiple processors (a workstation cluster) and develop a model for prediction of run-times for different numbers of processors. We demonstrate our system on Landsat Thematic Mapper MSI , covering the recent Cerro Grande fire at Los Alamos, NM, USA.

  20. Methods for operating parallel computing systems employing sequenced communications

    DOEpatents

    Benner, R.E.; Gustafson, J.L.; Montry, G.R.

    1999-08-10

    A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.

  1. Methods for operating parallel computing systems employing sequenced communications

    DOEpatents

    Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

    1999-01-01

    A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.

  2. Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1989-01-01

    The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.

  3. Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

    PubMed Central

    Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

    2014-01-01

    Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950

  4. Template-directed atomically precise self-organization of perfectly ordered parallel cerium silicide nanowire arrays on Si(110)-16 × 2 surfaces.

    PubMed

    Hong, Ie-Hong; Liao, Yung-Cheng; Tsai, Yung-Feng

    2013-11-05

    The perfectly ordered parallel arrays of periodic Ce silicide nanowires can self-organize with atomic precision on single-domain Si(110)-16 × 2 surfaces. The growth evolution of self-ordered parallel Ce silicide nanowire arrays is investigated over a broad range of Ce coverages on single-domain Si(110)-16 × 2 surfaces by scanning tunneling microscopy (STM). Three different types of well-ordered parallel arrays, consisting of uniformly spaced and atomically identical Ce silicide nanowires, are self-organized through the heteroepitaxial growth of Ce silicides on a long-range grating-like 16 × 2 reconstruction at the deposition of various Ce coverages. Each atomically precise Ce silicide nanowire consists of a bundle of chains and rows with different atomic structures. The atomic-resolution dual-polarity STM images reveal that the interchain coupling leads to the formation of the registry-aligned chain bundles within individual Ce silicide nanowire. The nanowire width and the interchain coupling can be adjusted systematically by varying the Ce coverage on a Si(110) surface. This natural template-directed self-organization of perfectly regular parallel nanowire arrays allows for the precise control of the feature size and positions within ±0.2 nm over a large area. Thus, it is a promising route to produce parallel nanowire arrays in a straightforward, low-cost, high-throughput process.

  5. Template-directed atomically precise self-organization of perfectly ordered parallel cerium silicide nanowire arrays on Si(110)-16 × 2 surfaces

    PubMed Central

    2013-01-01

    The perfectly ordered parallel arrays of periodic Ce silicide nanowires can self-organize with atomic precision on single-domain Si(110)-16 × 2 surfaces. The growth evolution of self-ordered parallel Ce silicide nanowire arrays is investigated over a broad range of Ce coverages on single-domain Si(110)-16 × 2 surfaces by scanning tunneling microscopy (STM). Three different types of well-ordered parallel arrays, consisting of uniformly spaced and atomically identical Ce silicide nanowires, are self-organized through the heteroepitaxial growth of Ce silicides on a long-range grating-like 16 × 2 reconstruction at the deposition of various Ce coverages. Each atomically precise Ce silicide nanowire consists of a bundle of chains and rows with different atomic structures. The atomic-resolution dual-polarity STM images reveal that the interchain coupling leads to the formation of the registry-aligned chain bundles within individual Ce silicide nanowire. The nanowire width and the interchain coupling can be adjusted systematically by varying the Ce coverage on a Si(110) surface. This natural template-directed self-organization of perfectly regular parallel nanowire arrays allows for the precise control of the feature size and positions within ±0.2 nm over a large area. Thus, it is a promising route to produce parallel nanowire arrays in a straightforward, low-cost, high-throughput process. PMID:24188092

  6. VASA: Interactive Computational Steering of Large Asynchronous Simulation Pipelines for Societal Infrastructure.

    PubMed

    Ko, Sungahn; Zhao, Jieqiong; Xia, Jing; Afzal, Shehzad; Wang, Xiaoyu; Abram, Greg; Elmqvist, Niklas; Kne, Len; Van Riper, David; Gaither, Kelly; Kennedy, Shaun; Tolone, William; Ribarsky, William; Ebert, David S

    2014-12-01

    We present VASA, a visual analytics platform consisting of a desktop application, a component model, and a suite of distributed simulation components for modeling the impact of societal threats such as weather, food contamination, and traffic on critical infrastructure such as supply chains, road networks, and power grids. Each component encapsulates a high-fidelity simulation model that together form an asynchronous simulation pipeline: a system of systems of individual simulations with a common data and parameter exchange format. At the heart of VASA is the Workbench, a visual analytics application providing three distinct features: (1) low-fidelity approximations of the distributed simulation components using local simulation proxies to enable analysts to interactively configure a simulation run; (2) computational steering mechanisms to manage the execution of individual simulation components; and (3) spatiotemporal and interactive methods to explore the combined results of a simulation run. We showcase the utility of the platform using examples involving supply chains during a hurricane as well as food contamination in a fast food restaurant chain.

  7. Approaches in highly parameterized inversion - GENIE, a general model-independent TCP/IP run manager

    USGS Publications Warehouse

    Muffels, Christopher T.; Schreuder, Willem A.; Doherty, John E.; Karanovic, Marinko; Tonkin, Matthew J.; Hunt, Randall J.; Welter, David E.

    2012-01-01

    GENIE is a model-independent suite of programs that can be used to generally distribute, manage, and execute multiple model runs via the TCP/IP infrastructure. The suite consists of a file distribution interface, a run manage, a run executer, and a routine that can be compiled as part of a program and used to exchange model runs with the run manager. Because communication is via a standard protocol (TCP/IP), any computer connected to the Internet can serve in any of the capacities offered by this suite. Model independence is consistent with the existing template and instruction file protocols of the widely used PEST parameter estimation program. This report describes (1) the problem addressed; (2) the approach used by GENIE to queue, distribute, and retrieve model runs; and (3) user instructions, classes, and functions developed. It also includes (4) an example to illustrate the linking of GENIE with Parallel PEST using the interface routine.

  8. 29 CFR 570.65 - Occupations involving the operation of circular saws, band saws, guillotine shears, chain saws...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... notches or teeth, running over wheels or pulleys, and used for sawing materials. Chain saw shall mean a... machine equipped with a moveable blade operated vertically and used to shear materials. The term shall not... moving blade that alternately changes direction on a linear cutting axis used for sawing materials. Wood...

  9. 29 CFR 570.65 - Occupations involving the operation of circular saws, band saws, guillotine shears, chain saws...

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... notches or teeth, running over wheels or pulleys, and used for sawing materials. Chain saw shall mean a... machine equipped with a moveable blade operated vertically and used to shear materials. The term shall not... moving blade that alternately changes direction on a linear cutting axis used for sawing materials. Wood...

  10. 29 CFR 570.65 - Occupations involved in the operations of circular saws, band saws, guillotine shears, chain saws...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... notches or teeth, running over wheels or pulleys, and used for sawing materials. Chain saw shall mean a... machine equipped with a moveable blade operated vertically and used to shear materials. The term shall not... moving blade that alternately changes direction on a linear cutting axis used for sawing materials. Wood...

  11. 29 CFR 570.65 - Occupations involving the operation of circular saws, band saws, guillotine shears, chain saws...

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... notches or teeth, running over wheels or pulleys, and used for sawing materials. Chain saw shall mean a... machine equipped with a moveable blade operated vertically and used to shear materials. The term shall not... moving blade that alternately changes direction on a linear cutting axis used for sawing materials. Wood...

  12. Local search to improve coordinate-based task mapping

    DOE PAGES

    Balzuweit, Evan; Bunde, David P.; Leung, Vitus J.; ...

    2015-10-31

    We present a local search strategy to improve the coordinate-based mapping of a parallel job’s tasks to the MPI ranks of its parallel allocation in order to reduce network congestion and the job’s communication time. The goal is to reduce the number of network hops between communicating pairs of ranks. Our target is applications with a nearest-neighbor stencil communication pattern running on mesh systems with non-contiguous processor allocation, such as Cray XE and XK Systems. Utilizing the miniGhost mini-app, which models the shock physics application CTH, we demonstrate that our strategy reduces application running time while also reducing the runtimemore » variability. Furthermore, we further show that mapping quality can vary based on the selected allocation algorithm, even between allocation algorithms of similar apparent quality.« less

  13. The instant sequencing task: Toward constraint-checking a complex spacecraft command sequence interactively

    NASA Technical Reports Server (NTRS)

    Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Amador, Arthur V.; Spitale, Joseph N.

    1993-01-01

    Robotic spacecraft are controlled by sets of commands called 'sequences.' These sequences must be checked against mission constraints. Making our existing constraint checking program faster would enable new capabilities in our uplink process. Therefore, we are rewriting this program to run on a parallel computer. To do so, we had to determine how to run constraint-checking algorithms in parallel and create a new method of specifying spacecraft models and constraints. This new specification gives us a means of representing flight systems and their predicted response to commands which could be used in a variety of applications throughout the command process, particularly during anomaly or high-activity operations. This commonality could reduce operations cost and risk for future complex missions. Lessons learned in applying some parts of this system to the TOPEX/Poseidon mission will be described.

  14. Obtaining identical results with double precision global accuracy on different numbers of processors in parallel particle Monte Carlo simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cleveland, Mathew A., E-mail: cleveland7@llnl.gov; Brunner, Thomas A.; Gentile, Nicholas A.

    2013-10-15

    We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo simulations. Reproducibility is desirable for code verification, testing, and debugging. Parallelism creates a unique problem for achieving reproducibility in Monte Carlo simulations because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. Parallel Monte Carlo, both domain replicated and decomposed simulations, will run their particles in a different order during different runs of the same simulation because the non-reproducibility of communication between processors. In addition, runs of the same simulation using different domain decompositionsmore » will also result in particles being simulated in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo simulations. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.« less

  15. Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, Terry R

    2012-01-01

    This paper describes a kernel scheduling algorithm that is based on coscheduling principles and that is intended for parallel applications running on 1000 cores or more. Experimental results for a Linux implementation on a Cray XT5 machine are presented. The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.

  16. Parallel deterministic neutronics with AMR in 3D

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clouse, C.; Ferguson, J.; Hendrickson, C.

    1997-12-31

    AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.

  17. A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

    DTIC Science & Technology

    2012-02-17

    to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of

  18. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  19. A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

    PubMed

    Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

    2014-01-01

    It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

  20. Parallel programming of industrial applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heroux, M; Koniges, A; Simon, H

    1998-07-21

    In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from thesemore » applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).« less

  1. Simplified Parallel Domain Traversal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erickson III, David J

    2011-01-01

    Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less

  2. An efficient parallel algorithm for matrix-vector multiplication

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hendrickson, B.; Leland, R.; Plimpton, S.

    The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less

  3. Parallel consistent labeling algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samal, A.; Henderson, T.

    Mackworth and Freuder have analyzed the time complexity of several constraint satisfaction algorithms. Mohr and Henderson have given new algorithms, AC-4 and PC-3, for arc and path consistency, respectively, and have shown that the arc consistency algorithm is optimal in time complexity and of the same order space complexity as the earlier algorithms. In this paper, they give parallel algorithms for solving node and arc consistency. They show that any parallel algorithm for enforcing arc consistency in the worst case must have O(na) sequential steps, where n is number of nodes, and a is the number of labels per node.more » They give several parallel algorithms to do arc consistency. It is also shown that they all have optimal time complexity. The results of running the parallel algorithms on a BBN Butterfly multiprocessor are also presented.« less

  4. Suppressing correlations in massively parallel simulations of lattice models

    NASA Astrophysics Data System (ADS)

    Kelling, Jeffrey; Ódor, Géza; Gemming, Sibylle

    2017-11-01

    For lattice Monte Carlo simulations parallelization is crucial to make studies of large systems and long simulation time feasible, while sequential simulations remain the gold-standard for correlation-free dynamics. Here, various domain decomposition schemes are compared, concluding with one which delivers virtually correlation-free simulations on GPUs. Extensive simulations of the octahedron model for 2 + 1 dimensional Kardar-Parisi-Zhang surface growth, which is very sensitive to correlation in the site-selection dynamics, were performed to show self-consistency of the parallel runs and agreement with the sequential algorithm. We present a GPU implementation providing a speedup of about 30 × over a parallel CPU implementation on a single socket and at least 180 × with respect to the sequential reference.

  5. Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2002-01-01

    Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.

  6. Crystal structure of catena-poly[N,N,N′,N′-tetra­methyl­guanidinium [(chlorido­cadmate)-di-μ-chlorido

    PubMed Central

    Ndiaye, Mamadou; Samb, Abdoulaye; Diop, Libasse; Maris, Thierry

    2016-01-01

    In the structure of the title salt, {(C5H14N3)[CdCl3]}n, the CdII atom of the complex anion is five-coordinated by one terminal and four bridging Cl atoms. The corresponding coordination polyhedron is a distorted trigonal bipyramid, with Cd—Cl distances in the range 2.4829 (4)–2.6402 (4) Å. The bipyramids are condensed into a polyanionic zigzag chain extending parallel to [101]. The tetra­methyl­guanidinium cations are situated between the polyanionic chains and are linked to them through N—H⋯Cl hydrogen bonds, forming a layered network parallel to (010). PMID:26870572

  7. Ethyl 2-[(carbamothioyl-amino)-imino]-propano-ate.

    PubMed

    Corrêa, Charlane C; Graúdo, José Eugênio J C; de Oliveira, Luiz Fernando C; de Almeida, Mauro V; Diniz, Renata

    2011-08-01

    The title compound, C(6)H(11)N(3)O(2)S, consists of a roughly planar mol-ecule (r.m.s deviation from planarity = 0.077 Å for the non-H atoms) and has the S atom in an anti position to the imine N atom. This N atom is the acceptor of a strongly bent inter-nal N-H⋯N hydrogen bond donated by the amino group. In the crystal, mol-ecules are arranged in undulating layers parallel to (010). The mol-ecules are linked via inter-molecular amino-carboxyl N-H⋯O hydrogen bonds, forming chains parallel to [001]. The chains are cross-linked by N(carbazone)-H⋯S and C-H⋯S inter-actions, forming infinite sheets.

  8. Removal of suspended solids and turbidity from marble processing wastewaters by electrocoagulation: comparison of electrode materials and electrode connection systems.

    PubMed

    Solak, Murat; Kiliç, Mehmet; Hüseyin, Yazici; Sencan, Aziz

    2009-12-15

    In this study, removal of suspended solids (SS) and turbidity from marble processing wastewaters by electrocoagulation (EC) process were investigated by using aluminium (Al) and iron (Fe) electrodes which were run in serial and parallel connection systems. To remove these pollutants from the marble processing wastewater, an EC reactor including monopolar electrodes (Al/Fe) in parallel and serial connection system, was utilized. Optimization of differential operation parameters such as pH, current density, and electrolysis time on SS and turbidity removal were determined in this way. EC process with monopolar Al electrodes in parallel and serial connections carried out at the optimum conditions where the pH value was 9, current density was approximately 15 A/m(2), and electrolysis time was 2 min resulted in 100% SS removal. Removal efficiencies of EC process for SS with monopolar Fe electrodes in parallel and serial connection were found to be 99.86% and 99.94%, respectively. Optimum parameters for monopolar Fe electrodes in both of the connection types were found to be for pH value as 8, for electrolysis time as 2 min. The optimum current density value for Fe electrodes used in serial and parallel connections was also obtained at 10 and 20 A/m(2), respectively. Based on the results obtained, it was found that EC process running with each type of the electrodes and the connections was highly effective for the removal of SS and turbidity from marble processing wastewaters, and that operating costs with monopolar Al electrodes in parallel connection were the cheapest than that of the serial connection and all the configurations for Fe electrode.

  9. Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images.

    PubMed

    Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy

    2017-10-06

    The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.

  10. A powered prosthetic ankle joint for walking and running.

    PubMed

    Grimmer, Martin; Holgate, Matthew; Holgate, Robert; Boehler, Alexander; Ward, Jeffrey; Hollander, Kevin; Sugar, Thomas; Seyfarth, André

    2016-12-19

    Current prosthetic ankle joints are designed either for walking or for running. In order to mimic the capabilities of an able-bodied, a powered prosthetic ankle for walking and running was designed. A powered system has the potential to reduce the limitations in range of motion and positive work output of passive walking and running feet. To perform the experiments a controller capable of transitions between standing, walking, and running with speed adaptations was developed. In the first case study the system was mounted on an ankle bypass in parallel with the foot of a non-amputee subject. By this method the functionality of hardware and controller was proven. The Walk-Run ankle was capable of mimicking desired torque and angle trajectories in walking and running up to 2.6 m/s. At 4 m/s running, ankle angle could be matched while ankle torque could not. Limited ankle output power resulting from a suboptimal spring stiffness value was identified as a main reason. Further studies have to show to what extent the findings can be transferred to amputees.

  11. Improvements of the ALICE HLT data transport framework for LHC Run 2

    NASA Astrophysics Data System (ADS)

    Rohr, David; Krzwicki, Mikolaj; Engel, Heiko; Lehrbach, Johannes; Lindenstruth, Volker; ALICE Collaboration

    2017-10-01

    The ALICE HLT uses a data transport framework based on the publisher- subscriber message principle, which transparently handles the communication between processing components over the network and between processing components on the same node via shared memory with a zero copy approach. We present an analysis of the performance in terms of maximum achievable data rates and event rates as well as processing capabilities during Run 1 and Run 2. Based on this analysis, we present new optimizations we have developed for ALICE in Run 2. These include support for asynchronous transport via Zero-MQ which enables loops in the reconstruction chain graph and which is used to ship QA histograms to DQM. We have added asynchronous processing capabilities in order to support long-running tasks besides the event-synchronous reconstruction tasks in normal HLT operation. These asynchronous components run in an isolated process such that the HLT as a whole is resilient even to fatal errors in these asynchronous components. In this way, we can ensure that new developments cannot break data taking. On top of that, we have tuned the processing chain to cope with the higher event and data rates expected from the new TPC readout electronics (RCU2) and we have improved the configuration procedure and the startup time in order to increase the time where ALICE can take physics data. We analyze the maximum achievable data processing rates taking into account processing capabilities of CPUs and GPUs, buffer sizes, network bandwidth, the incoming links from the detectors, and the outgoing links to data acquisition.

  12. Argonne Simulation Framework for Intelligent Transportation Systems

    DOT National Transportation Integrated Search

    1996-01-01

    A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distribu...

  13. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Earth Sciences Division; Zhang, Keni; Zhang, Keni

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less

  14. Terascale Cluster for Advanced Turbulent Combustion Simulations

    DTIC Science & Technology

    2008-07-25

    the system We have given the name CATS (for Combustion And Turbulence Simulator) to the terascale system that was obtained through this grant. CATS ...lnfiniBand interconnect. CATS includes an interactive login node and a file server, each holding in excess of 1 terabyte of file storage. The 35 active...compute nodes of CATS enable us to run up to 140-core parallel MPI batch jobs; one node is reserved to run the scheduler. CATS is operated and

  15. Coupled Ocean/Atmospheric Mesoscale Prediction System (COAMPS), Version 5.0 (User’s Guide)

    DTIC Science & Technology

    2010-03-30

    provides tools for common modeling functions, as well as regridding, data decomposition, and communication on parallel computers. NRL/MR/7320--10...specified gncomDir. If running COAMPS at the DSRC (e.g. BABBAGE, DAVINCI , or EINSTEIN), the global NCOM files will be copied to /scr/[user]/COAMPS/data...the site (DSRC or local) and the platform (BABBAGE. DAVINCI , EINSTEIN, or local machine) on which COAMPS is being run. site=navy_dsrc (for DSRC

  16. Demonstration and Commercialization of the Sediment Ecosystem Assessment Protocol (SEAP)

    DTIC Science & Technology

    2017-07-09

    undergone severe erosion (Peeling 1975). Zuniga Jetty, which runs parallel to Point Loma at the bay’s inlet, was built to control erosion near the inlet...consistent conditions and level of effort required to run the tests. A per site unit cost is less amenable to a field-based deployment, given the many...support in situ tetsing: 1) a standard exposure of spores to a reference toxicant dilutuion series; and 2) exposure of sporophyll blades to a

  17. The Ophidia framework: toward cloud-based data analytics for climate change

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; D'Anca, Alessandro; Elia, Donatello; Mancini, Marco; Mariello, Andrea; Mirto, Maria; Palazzo, Cosimo; Aloisio, Giovanni

    2015-04-01

    The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in the climate change domain. It provides parallel (server-side) data analysis, an internal storage model and a hierarchical data organization to manage large amount of multidimensional scientific data. The Ophidia analytics platform provides several MPI-based parallel operators to manipulate large datasets (data cubes) and array-based primitives to perform data analysis on large arrays of scientific data. The most relevant data analytics use cases implemented in national and international projects target fire danger prevention (OFIDIA), interactions between climate change and biodiversity (EUBrazilCC), climate indicators and remote data analysis (CLIP-C), sea situational awareness (TESSA), large scale data analytics on CMIP5 data in NetCDF format, Climate and Forecast (CF) convention compliant (ExArch). Two use cases regarding the EU FP7 EUBrazil Cloud Connect and the INTERREG OFIDIA projects will be presented during the talk. In the former case (EUBrazilCC) the Ophidia framework is being extended to integrate scalable VM-based solutions for the management of large volumes of scientific data (both climate and satellite data) in a cloud-based environment to study how climate change affects biodiversity. In the latter one (OFIDIA) the data analytics framework is being exploited to provide operational support regarding processing chains devoted to fire danger prevention. To tackle the project challenges, data analytics workflows consisting of about 130 operators perform, among the others, parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, import/export of datasets in NetCDF format. Finally, the entire Ophidia software stack has been deployed at CMCC on 24-nodes (16-cores/node) of the Athena HPC cluster. Moreover, a cloud-based release tested with OpenNebula is also available and running in the private cloud infrastructure of the CMCC Supercomputing Centre.

  18. Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

    PubMed

    Madhyastha, Tara M; Koh, Natalie; Day, Trevor K M; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J; Rajan, Sabreena; Woelfer, Karl A; Wolf, Jonathan; Grabowski, Thomas J

    2017-01-01

    The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows "in the cloud." Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster.

  19. Study of the mapping of Navier-Stokes algorithms onto multiple-instruction/multiple-data-stream computers

    NASA Technical Reports Server (NTRS)

    Eberhardt, D. S.; Baganoff, D.; Stevens, K.

    1984-01-01

    Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.

  20. Automated JPSS VIIRS GEO code change testing by using Chain Run Scripts

    NASA Astrophysics Data System (ADS)

    Chen, W.; Wang, W.; Zhao, Q.; Das, B.; Mikles, V. J.; Sprietzer, K.; Tsidulko, M.; Zhao, Y.; Dharmawardane, V.; Wolf, W.

    2015-12-01

    The Joint Polar Satellite System (JPSS) is the next generation polar-orbiting operational environmental satellite system. The first satellite in the JPSS series of satellites, J-1, is scheduled to launch in early 2017. J1 will carry similar versions of the instruments that are on board of Suomi National Polar-Orbiting Partnership (S-NPP) satellite which was launched on October 28, 2011. The center for Satellite Applications and Research Algorithm Integration Team (STAR AIT) uses the Algorithm Development Library (ADL) to run S-NPP and pre-J1 algorithms in a development and test mode. The ADL is an offline test system developed by Raytheon to mimic the operational system while enabling a development environment for plug and play algorithms. The Perl Chain Run Scripts have been developed by STAR AIT to automate the staging and processing of multiple JPSS Sensor Data Record (SDR) and Environmental Data Record (EDR) products. JPSS J1 VIIRS Day Night Band (DNB) has anomalous non-linear response at high scan angles based on prelaunch testing. The flight project has proposed multiple mitigation options through onboard aggregation, and the Option 21 has been suggested by the VIIRS SDR team as the baseline aggregation mode. VIIRS GEOlocation (GEO) code analysis results show that J1 DNB GEO product cannot be generated correctly without the software update. The modified code will support both Op21, Op21/26 and is backward compatible with SNPP. J1 GEO code change version 0 delivery package is under development for the current change request. In this presentation, we will discuss how to use the Chain Run Script to verify the code change and Lookup Tables (LUTs) update in ADL Block2.

  1. DNA Assembly with De Bruijn Graphs Using an FPGA Platform.

    PubMed

    Poirier, Carl; Gosselin, Benoit; Fortier, Paul

    2018-01-01

    This paper presents an FPGA implementation of a DNA assembly algorithm, called Ray, initially developed to run on parallel CPUs. The OpenCL language is used and the focus is placed on modifying and optimizing the original algorithm to better suit the new parallelization tool and the radically different hardware architecture. The results show that the execution time is roughly one fourth that of the CPU and factoring energy consumption yields a tenfold savings.

  2. Multiprocessor graphics computation and display using transputers

    NASA Technical Reports Server (NTRS)

    Ellis, Graham K.

    1988-01-01

    A package of two-dimensional graphics routines was developed to run on a transputer-based parallel processing system. These routines were designed to enable applications programmers to easily generate and display results from the transputer network in a graphic format. The graphics procedures were designed for the lowest possible network communication overhead for increased performance. The routines were designed for ease of use and to present an intuitive approach to generating graphics on the transputer parallel processing system.

  3. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

    PubMed

    Wilson, J Adam; Williams, Justin C

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  4. An object-oriented approach to nested data parallelism

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Chatterjee, Siddhartha

    1994-01-01

    This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

  5. A mechanism for efficient debugging of parallel programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, B.P.; Choi, J.D.

    1988-01-01

    This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less

  6. Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser-induced fluorescence detection.

    PubMed

    Nikcevic, Irena; Piruska, Aigars; Wehmeyer, Kenneth R; Seliskar, Carl J; Limbach, Patrick A; Heineman, William R

    2010-08-01

    Parallel separations using CE on a multilane microchip with multiplexed LIF detection is demonstrated. The detection system was developed to simultaneously record data on all channels using an expanded laser beam for excitation, a camera lens to capture emission, and a CCD camera for detection. The detection system enables monitoring of each channel continuously and distinguishing individual lanes without significant crosstalk between adjacent lanes. Multiple analytes can be determined in parallel lanes within a single microchip in a single run, leading to increased sample throughput. The pK(a) determination of small molecule analytes is demonstrated with the multilane microchip.

  7. Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser induced fluorescence detection

    PubMed Central

    Nikcevic, Irena; Piruska, Aigars; Wehmeyer, Kenneth R.; Seliskar, Carl J.; Limbach, Patrick A.; Heineman, William R.

    2010-01-01

    Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser induced fluorescence detection is demonstrated. The detection system was developed to simultaneously record data on all channels using an expanded laser beam for excitation, a camera lens to capture emission, and a CCD camera for detection. The detection system enables monitoring of each channel continuously and distinguishing individual lanes without significant crosstalk between adjacent lanes. Multiple analytes can be analyzed on parallel lanes within a single microchip in a single run, leading to increased sample throughput. The pKa determination of small molecule analytes is demonstrated with the multilane microchip. PMID:20737446

  8. A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

    In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.

  9. Simulation of ozone production in a complex circulation region using nested grids

    NASA Astrophysics Data System (ADS)

    Taghavi, M.; Cautenet, S.; Foret, G.

    2003-07-01

    During ESCOMPTE precampaign (15 June to 10 July 2000), three days of intensive pollution (IOP0) have been observed and simulated. The comprehensive RAMS model, version 4.3, coupled online with a chemical module including 29 species, has been used to follow the chemistry of the zone polluted over southern France. This online method can be used because the code is paralleled and the SGI 3800 computer is very powerful. Two runs have been performed: run1 with one grid and run2 with two nested grids. The redistribution of simulated chemical species (ozone, carbon monoxide, sulphur dioxide and nitrogen oxides) was compared to aircraft measurements and surface stations. The 2-grid run has given substantially better results than the one-grid run only because the former takes the outer pollutants into account. This online method helps to explain dynamics and to retrieve the chemical species redistribution with a good agreement.

  10. Simulation of ozone production in a complex circulation region using nested grids

    NASA Astrophysics Data System (ADS)

    Taghavi, M.; Cautenet, S.; Foret, G.

    2004-06-01

    During the ESCOMPTE precampaign (summer 2000, over Southern France), a 3-day period of intensive observation (IOP0), associated with ozone peaks, has been simulated. The comprehensive RAMS model, version 4.3, coupled on-line with a chemical module including 29 species, is used to follow the chemistry of the polluted zone. This efficient but time consuming method can be used because the code is installed on a parallel computer, the SGI 3800. Two runs are performed: run 1 with a single grid and run 2 with two nested grids. The simulated fields of ozone, carbon monoxide, nitrogen oxides and sulfur dioxide are compared with aircraft and surface station measurements. The 2-grid run looks substantially better than the run with one grid because the former takes the outer pollutants into account. This on-line method helps to satisfactorily retrieve the chemical species redistribution and to explain the impact of dynamics on this redistribution.

  11. Development of structural schemes of parallel structure manipulators using screw calculus

    NASA Astrophysics Data System (ADS)

    Rashoyan, G. V.; Shalyukhin, K. A.; Gaponenko, EV

    2018-03-01

    The paper considers the approach to the structural analysis and synthesis of parallel structure robots based on the mathematical apparatus of groups of screws and on a concept of reciprocity of screws. The results are depicted of synthesis of parallel structure robots with different numbers of degrees of freedom, corresponding to the different groups of screws. Power screws are applied with this aim, based on the principle of static-kinematic analogy; the power screws are similar to the orts of axes of not driven kinematic pairs of a corresponding connecting chain. Accordingly, kinematic screws of the outlet chain of a robot are simultaneously determined which are reciprocal to power screws of kinematic sub-chains. Solution of certain synthesis problems is illustrated with practical applications. Closed groups of screws can have eight types. The three-membered groups of screws are of greatest significance, as well as four-membered screw groups [1] and six-membered screw groups. Three-membered screw groups correspond to progressively guiding mechanisms, to spherical mechanisms, and to planar mechanisms. The four-membered group corresponds to the motion of the SCARA robot. The six-membered group includes all possible motions. From the works of A.P. Kotelnikov, F.M. Dimentberg, it is known that closed fifth-order screw groups do not exist. The article presents examples of the mechanisms corresponding to the given groups.

  12. Strong contributions from vertical triads to helix-partner preferences in parallel coiled coils.

    PubMed

    Steinkruger, Jay D; Bartlett, Gail J; Woolfson, Derek N; Gellman, Samuel H

    2012-09-26

    Pairing preferences in heterodimeric coiled coils are determined by complementarities among side chains that pack against one another at the helix-helix interface. However, relationships between dimer stability and interfacial residue identity are not fully understood. In the context of the "knobs-into-holes" (KIH) packing pattern, one can identify two classes of interactions between side chains from different helices: "lateral", in which a line connecting the adjacent side chains is perpendicular to the helix axes, and "vertical", in which the connecting line is parallel to the helix axes. We have previously analyzed vertical interactions in antiparallel coiled coils and found that one type of triad constellation (a'-a-a') exerts a strong effect on pairing preferences, while the other type of triad (d'-d-d') has relatively little impact on pairing tendencies. Here, we ask whether vertical interactions (d'-a-d') influence pairing in parallel coiled-coil dimers. Our results indicate that vertical interactions can exert a substantial impact on pairing specificity, and that the influence of the d'-a-d' triad depends on the lateral a' contact within the local KIH motif. Structure-informed bioinformatic analyses of protein sequences reveal trends consistent with the thermodynamic data derived from our experimental model system in suggesting that heterotriads involving Leu and Ile are preferred over homotriads involving Leu and Ile.

  13. Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

    2010-01-01

    Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less

  14. Zincobotryogen, ZnFe3+(SO4)2(OH)ṡ7H2O: validation as a mineral species and new data

    NASA Astrophysics Data System (ADS)

    Yang, Zhuming; Giester, Gerald; Mao, Qian; Ma, Yuguang; Zhang, Di; Li, He

    2017-06-01

    Zincobotryogen occurs in the oxidation zone of the Xitieshan lead-zinc deposit, Qinghai, China. The mineral is associated with jarosite, copiapite, zincocopiapite, and quartz. The mineral forms prismatic crystals, 0.05 to 2 mm in size. It is optically positive (2Vcalc = 54.1°), with Z ‖ b and X ∧ c = 10°. The elongation is negative. The refractive indices are n α = 1.542(5), n β = 1.551(5), n γ = 1.587(5). The pleochroism scheme is X = colorless, Y = light yellow, Z = yellow. Microprobe analysis gave (in wt%): SO3 = 38.04, Al2O3 = 0.04, Fe2O3 = 18.46, ZnO = 13.75, MgO = 1.52, MnO = 1.23, H2O = 31.06 (by calculation), Total = 104.10. The simplified formula is (Zn,Mg)Fe3+(SO4)2(OH)ṡ7H2O. The mineral is monoclinic, P121/ n1, a = 10.504(2), b = 17.801(4), c = 7.1263(14) Å, and β = 100.08(3)°, V = 1311.9(5) Å3, Z = 4. The strongest lines in the powder X-ray diffraction pattern d(I)( hkl) are: 8.92 (100)(110), 6.32 (77)(-101), 5.56 (23)(021), 4.08 (22)(-221),3.21 (31)(231), 3.03 (34)(032), 2.77 (22)(042). The crystal structure was refined using 2816 unique reflections to R1( F) = 0.0355 and wR2( F 2) = 0.0651. The refined formula is (Zn0.84Mg0.16)Fe3+(SO4)2(OH)ṡ7H2O. The atomic arrangement is characterized by chains with composition [Fe3+(SO4)2(OH)(H2O)]2- and 7 Å repeat distance running parallel to the c-axis. The chain links to a [ MO(H2O)5] octahedron ( M = Zn, Mg) and an unshared H2O molecule, and forms a larger chain building module with composition [ M 2+Fe3+(SO4)2(OH)(H2O)6(H2O)]. The inter-chain module linkage involves only hydrogen bonding.

  15. Comparative evaluation of the liver in dogs with a splenic mass by using ultrasonography and contrast-enhanced computed tomography

    PubMed Central

    Irausquin, Roelof A.; Scavelli, Thomas D.; Corti, Lisa; Stefanacci, Joseph D.; DeMarco, Joann; Flood, Shannon; Rohrbach, Barton W.

    2008-01-01

    Evaluation of dogs with splenic masses to better educate owners as to the extent of the disease is a goal of many research studies. We compared the use of ultrasonography (US) and contrast-enhanced computed tomography (CT) to evaluate the accuracy of detecting hepatic neoplasia in dogs with splenic masses, independently, in series, or in parallel. No significant difference was found between US and CT. If the presence or absence of ascites, as detected with US, was used as a pretest probability of disease in our population, the positive predictive value increased to 94% if the tests were run in series, and the negative predictive value increased to 95% if the tests were run in parallel. The study showed that CT combined with US could be a valuable tool in evaluation of dogs with splenic masses. PMID:18320977

  16. Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

    DOEpatents

    Gara, Alan; Ohmacht, Martin

    2014-09-16

    In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

  17. Armor structures

    DOEpatents

    Chu, Henry Shiu-Hung [Idaho Falls, ID; Lacy, Jeffrey M [Idaho Falls, ID

    2008-04-01

    An armor structure includes first and second layers individually containing a plurality of i-beams. Individual i-beams have a pair of longitudinal flanges interconnected by a longitudinal crosspiece and defining opposing longitudinal channels between the pair of flanges. The i-beams within individual of the first and second layers run parallel. The laterally outermost faces of the flanges of adjacent i-beams face one another. One of the longitudinal channels in each of the first and second layers faces one of the longitudinal channels in the other of the first and second layers. The channels of the first layer run parallel with the channels of the second layer. The flanges of the first and second layers overlap with the crosspieces of the other of the first and second layers, and portions of said flanges are received within the facing channels of the i-beams of the other of the first and second layers.

  18. An Evaluation of Kernel Equating: Parallel Equating with Classical Methods in the SAT Subject Tests[TM] Program. Research Report. ETS RR-09-06

    ERIC Educational Resources Information Center

    Grant, Mary C.; Zhang, Lilly; Damiano, Michele

    2009-01-01

    This study investigated kernel equating methods by comparing these methods to operational equatings for two tests in the SAT Subject Tests[TM] program. GENASYS (ETS, 2007) was used for all equating methods and scaled score kernel equating results were compared to Tucker, Levine observed score, chained linear, and chained equipercentile equating…

  19. Carotid chemoreceptors tune breathing via multipath routing: reticular chain and loop operations supported by parallel spike train correlations.

    PubMed

    Morris, Kendall F; Nuding, Sarah C; Segers, Lauren S; Iceman, Kimberly E; O'Connor, Russell; Dean, Jay B; Ott, Mackenzie M; Alencar, Pierina A; Shuman, Dale; Horton, Kofi-Kermit; Taylor-Clark, Thomas E; Bolser, Donald C; Lindsey, Bruce G

    2018-02-01

    We tested the hypothesis that carotid chemoreceptors tune breathing through parallel circuit paths that target distinct elements of an inspiratory neuron chain in the ventral respiratory column (VRC). Microelectrode arrays were used to monitor neuronal spike trains simultaneously in the VRC, peri-nucleus tractus solitarius (p-NTS)-medial medulla, the dorsal parafacial region of the lateral tegmental field (FTL-pF), and medullary raphe nuclei together with phrenic nerve activity during selective stimulation of carotid chemoreceptors or transient hypoxia in 19 decerebrate, neuromuscularly blocked, and artificially ventilated cats. Of 994 neurons tested, 56% had a significant change in firing rate. A total of 33,422 cell pairs were evaluated for signs of functional interaction; 63% of chemoresponsive neurons were elements of at least one pair with correlational signatures indicative of paucisynaptic relationships. We detected evidence for postinspiratory neuron inhibition of rostral VRC I-Driver (pre-Bötzinger) neurons, an interaction predicted to modulate breathing frequency, and for reciprocal excitation between chemoresponsive p-NTS neurons and more downstream VRC inspiratory neurons for control of breathing depth. Chemoresponsive pericolumnar tonic expiratory neurons, proposed to amplify inspiratory drive by disinhibition, were correlationally linked to afferent and efferent "chains" of chemoresponsive neurons extending to all monitored regions. The chains included coordinated clusters of chemoresponsive FTL-pF neurons with functional links to widespread medullary sites involved in the control of breathing. The results support long-standing concepts on brain stem network architecture and a circuit model for peripheral chemoreceptor modulation of breathing with multiple circuit loops and chains tuned by tegmental field neurons with quasi-periodic discharge patterns. NEW & NOTEWORTHY We tested the long-standing hypothesis that carotid chemoreceptors tune the frequency and depth of breathing through parallel circuit operations targeting the ventral respiratory column. Responses to stimulation of the chemoreceptors and identified functional connectivity support differential tuning of inspiratory neuron burst duration and firing rate and a model of brain stem network architecture incorporating tonic expiratory "hub" neurons regulated by convergent neuronal chains and loops through rostral lateral tegmental field neurons with quasi-periodic discharge patterns.

  20. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Shuangshuang; Chen, Yousu; Wu, Di

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

  1. Fast data preprocessing with Graphics Processing Units for inverse problem solving in light-scattering measurements

    NASA Astrophysics Data System (ADS)

    Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.

    2017-07-01

    Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.

  2. Quantitative and qualitative measure of intralaboratory two-dimensional protein gel reproducibility and the effects of sample preparation, sample load, and image analysis.

    PubMed

    Choe, Leila H; Lee, Kelvin H

    2003-10-01

    We investigate one approach to assess the quantitative variability in two-dimensional gel electrophoresis (2-DE) separations based on gel-to-gel variability, sample preparation variability, sample load differences, and the effect of automation on image analysis. We observe that 95% of spots present in three out of four replicate gels exhibit less than a 0.52 coefficient of variation (CV) in fluorescent stain intensity (% volume) for a single sample run on multiple gels. When four parallel sample preparations are performed, this value increases to 0.57. We do not observe any significant change in quantitative value for an increase or decrease in sample load of 30% when using appropriate image analysis variables. Increasing use of automation, while necessary in modern 2-DE experiments, does change the observed level of quantitative and qualitative variability among replicate gels. The number of spots that change qualitatively for a single sample run in parallel varies from a CV = 0.03 for fully manual analysis to CV = 0.20 for a fully automated analysis. We present a systematic method by which a single laboratory can measure gel-to-gel variability using only three gel runs.

  3. Compute Server Performance Results

    NASA Technical Reports Server (NTRS)

    Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)

    1994-01-01

    Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,

  4. PREMER: a Tool to Infer Biological Networks.

    PubMed

    Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R

    2017-10-04

    Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).

  5. pFlogger: The Parallel Fortran Logging Utility

    NASA Technical Reports Server (NTRS)

    Clune, Tom; Cruz, Carlos A.

    2017-01-01

    In the context of high performance computing (HPC), software investments in support of text-based diagnostics, which monitor a running application, are typically limited compared to those for other types of IO. Examples of such diagnostics include reiteration of configuration parameters, progress indicators, simple metrics (e.g., mass conservation, convergence of solvers, etc.), and timers. To some degree, this difference in priority is justifiable as other forms of output are the primary products of a scientific model and, due to their large data volume, much more likely to be a significant performance concern. In contrast, text-based diagnostic content is generally not shared beyond the individual or group running an application and is most often used to troubleshoot when something goes wrong. We suggest that a more systematic approach enabled by a logging facility (or 'logger)' similar to those routinely used by many communities would provide significant value to complex scientific applications. In the context of high-performance computing, an appropriate logger would provide specialized support for distributed and shared-memory parallelism and have low performance overhead. In this paper, we present our prototype implementation of pFlogger - a parallel Fortran-based logging framework, and assess its suitability for use in a complex scientific application.

  6. A Programming Model Performance Study Using the NAS Parallel Benchmarks

    DOE PAGES

    Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

    2010-01-01

    Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less

  7. Evaluation of Job Queuing/Scheduling Software: Phase I Report

    NASA Technical Reports Server (NTRS)

    Jones, James Patton

    1996-01-01

    The recent proliferation of high performance work stations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, the national Aerodynamic Simulation (NAS) supercomputer facility compiled a requirements checklist for job queuing/scheduling software. Next, NAS began an evaluation of the leading job management system (JMS) software packages against the checklist. This report describes the three-phase evaluation process, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still insufficient, even in the leading JMS's. However, by ranking each JMS evaluated against the requirements, we provide data that will be useful to other sites in selecting a JMS.

  8. Three closely related (2E,2′E)-3,3′-(1,4-phenyl­ene)bis­[1-(meth­oxy­phen­yl)prop-2-en-1-ones]: supra­molecular assemblies in one dimension mediated by hydrogen bonding and C—H⋯π inter­actions

    PubMed Central

    Chidan Kumar, C. S.; Then, Li Yee; Win, Yip-Foo; Quah, Ching Kheng; Naveen, S.; Chandraju, S.; Lokanath, N. K.; Warad, Ismail

    2017-01-01

    In the title compounds, (2E,2′E)-3,3′-(1,4-phenyl­ene)bis­[1-(2-meth­oxy­phen­yl)prop-2-en-1-one], C26H22O4 (I), (2E,2′E)-3,3′-(1,4-phenyl­ene)bis­[1-(3-meth­oxy­phen­yl)prop-2-en-1-one], C26H22O4 (II) and (2E,2′E)-3,3′-(1,4-phenyl­ene)bis­[1-(3,4-di­meth­oxy­phen­yl)prop-2-en-1-one], C28H26O6 (III), the asymmetric unit consists of a half-mol­ecule, completed by crystallographic inversion symmetry. The dihedral angles between the central and terminal benzene rings are 56.98 (8), 7.74 (7) and 7.73 (7)° for (I), (II) and (III), respectively. In the crystal of (I), mol­ecules are linked by pairs of C—H⋯π inter­actions into chains running parallel to [101]. The packing for (II) and (III), features inversion dimers linked by pairs of C—H⋯O hydrogen bonds, forming R 2 2(16) and R 2 2(14) ring motifs, respectively, as parts of [201] and [101] chains, respectively. PMID:28638654

  9. Poly[[tetra-μ-cyanido-κ8 C:N-dodeca-cyanido-κ12 C-tris­(N,N-di­methyl­formamide-κO)tris­(methanol-κO)tris­(3,4,7,8-tetra­methyl-1,10-phenanthroline-κ2 N,N′)trimanganese(II)ditungstate(V)] dihydrate

    PubMed Central

    Yang, Fei-Lin; Yang, Dan

    2014-01-01

    The asymmetric unit of the title compound, {[Mn3{W(CN)8}2(C16H16N2)3(C3H7NO)3(CH3OH)3]·2H2O}n, consists of three [Mn(N,N-di­methyl­formamide)(methanol)(3,4,7,8-tetra­methyl-1,10-phenanthroline)]2+ cations, two [W(CN)8]3− anions and two water mol­ecules. Each water mol­ecule is disordered over three sets of sites, with a refined occupancy ratio of 0.310 (9):0.275 (9):0.415 (9) for one mol­ecule and 0.335 (9):0.288 (9):0.377 (9) for the other mol­ecule. The MnII atoms exhibit a distorted octa­hedral geometry, while the WV atoms adopt a distorted square-anti­prismatic geometry. The MnII and WV atoms are linked alternatively through cyanide groups, forming a tetra­nuclear 12-atom rhombic metallacycle. Adjacent metallacycles are further connected by μ2-bridging cyanide anions, generating a 3,2-chain structure running parallel to [101]. Inter­chain π–π inter­actions are observed [centroid–centroid distances = 3.763 (3) and 3.620 (2) Å]. PMID:24860305

  10. High-throughput microfluidic single-cell digital polymerase chain reaction.

    PubMed

    White, A K; Heyries, K A; Doolin, C; Vaninsberghe, M; Hansen, C L

    2013-08-06

    Here we present an integrated microfluidic device for the high-throughput digital polymerase chain reaction (dPCR) analysis of single cells. This device allows for the parallel processing of single cells and executes all steps of analysis, including cell capture, washing, lysis, reverse transcription, and dPCR analysis. The cDNA from each single cell is distributed into a dedicated dPCR array consisting of 1020 chambers, each having a volume of 25 pL, using surface-tension-based sample partitioning. The high density of this dPCR format (118,900 chambers/cm(2)) allows the analysis of 200 single cells per run, for a total of 204,000 PCR reactions using a device footprint of 10 cm(2). Experiments using RNA dilutions show this device achieves shot-noise-limited performance in quantifying single molecules, with a dynamic range of 10(4). We performed over 1200 single-cell measurements, demonstrating the use of this platform in the absolute quantification of both high- and low-abundance mRNA transcripts, as well as micro-RNAs that are not easily measured using alternative hybridization methods. We further apply the specificity and sensitivity of single-cell dPCR to performing measurements of RNA editing events in single cells. High-throughput dPCR provides a new tool in the arsenal of single-cell analysis methods, with a unique combination of speed, precision, sensitivity, and specificity. We anticipate this approach will enable new studies where high-performance single-cell measurements are essential, including the analysis of transcriptional noise, allelic imbalance, and RNA processing.

  11. Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

    PubMed

    Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

    2014-07-05

    A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.

  12. Parallelization of the FLAPW method and comparison with the PPW method

    NASA Astrophysics Data System (ADS)

    Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur

    2000-03-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.

  13. A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    NASA Technical Reports Server (NTRS)

    Jones, Mark Howard

    1987-01-01

    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

  14. Ethyl 2-[(carbamothioyl­amino)­imino]­propano­ate

    PubMed Central

    Corrêa, Charlane C.; Graúdo, José Eugênio J.C.; de Oliveira, Luiz Fernando C.; de Almeida, Mauro V.; Diniz, Renata

    2011-01-01

    The title compound, C6H11N3O2S, consists of a roughly planar mol­ecule (r.m.s deviation from planarity = 0.077 Å for the non-H atoms) and has the S atom in an anti position to the imine N atom. This N atom is the acceptor of a strongly bent inter­nal N—H⋯N hydrogen bond donated by the amino group. In the crystal, mol­ecules are arranged in undulating layers parallel to (010). The mol­ecules are linked via inter­molecular amino–carboxyl N—H⋯O hydrogen bonds, forming chains parallel to [001]. The chains are cross-linked by Ncarbazone—H⋯S and C—H⋯S inter­actions, forming infinite sheets. PMID:22091006

  15. 2,2′-Dimethoxy-4,4′-[rel-(2R,3S)-2,3-di­methylbutane-1,4-diyl]diphenol

    PubMed Central

    Salinas-Salazar, Carmen L.; del Rayo Camacho-Corona, María; Bernès, Sylvain; Waksman de Torres, Noemi

    2009-01-01

    The title mol­ecule, C20H26O4, commonly known as meso-dihydro­guaiaretic acid, is a naturally occurring lignan extracted from Larrea tridentata and other plants. The mol­ecule has a noncrystallographic inversion center situated at the midpoint of the central C—C bond, generating the meso stereoisomer. The central C—C—C—C alkyl chain displays an all-trans conformation, allowing an almost parallel arrangement of the benzene rings, which make a dihedral angle of 5.0 (3)°. Both hydr­oxy groups form weak O—H⋯O—H chains of hydrogen bonds along [100]. The resulting supra­molecular structure is an undulating plane parallel to (010). PMID:21583141

  16. Rapid and Sensitive Assessment of Globin Chains for Gene and Cell Therapy of Hemoglobinopathies

    PubMed Central

    Loucari, Constantinos C.; Patsali, Petros; van Dijk, Thamar B.; Stephanou, Coralea; Papasavva, Panayiota; Zanti, Maria; Kurita, Ryo; Nakamura, Yukio; Christou, Soteroulla; Sitarou, Maria; Philipsen, Sjaak; Lederer, Carsten W.; Kleanthous, Marina

    2018-01-01

    The β-hemoglobinopathies sickle cell anemia and β-thalassemia are the focus of many gene-therapy studies. A key disease parameter is the abundance of globin chains because it indicates the level of anemia, likely toxicity of excess or aberrant globins, and therapeutic potential of induced or exogenous β-like globins. Reversed-phase high-performance liquid chromatography (HPLC) allows versatile and inexpensive globin quantification, but commonly applied protocols suffer from long run times, high sample requirements, or inability to separate murine from human β-globin chains. The latter point is problematic for in vivo studies with gene-addition vectors in murine disease models and mouse/human chimeras. This study demonstrates HPLC-based measurements of globin expression (1) after differentiation of the commonly applied human umbilical cord blood–derived erythroid progenitor-2 cell line, (2) in erythroid progeny of CD34+ cells for the analysis of clustered regularly interspaced short palindromic repeats/Cas9-mediated disruption of the globin regulator BCL11A, and (3) of transgenic mice holding the human β-globin locus. At run times of 8 min for separation of murine and human β-globin chains as well as of human γ-globin chains, and with routine measurement of globin-chain ratios for 12 nL of blood (tested for down to 0.75 nL) or of 300,000 in vitro differentiated cells, the methods presented here and any variant-specific adaptations thereof will greatly facilitate evaluation of novel therapy applications for β-hemoglobinopathies. PMID:29325430

  17. Scaling up antiretroviral therapy in Uganda: using supply chain management to appraise health systems strengthening.

    PubMed

    Windisch, Ricarda; Waiswa, Peter; Neuhann, Florian; Scheibe, Florian; de Savigny, Don

    2011-08-01

    Strengthened national health systems are necessary for effective and sustained expansion of antiretroviral therapy (ART). ART and its supply chain management in Uganda are largely based on parallel and externally supported efforts. The question arises whether systems are being strengthened to sustain access to ART. This study applies systems thinking to assess supply chain management, the role of external support and whether investments create the needed synergies to strengthen health systems. This study uses the WHO health systems framework and examines the issues of governance, financing, information, human resources and service delivery in relation to supply chain management of medicines and the technologies. It looks at links and causal chains between supply chain management for ART and the national supply system for essential drugs. It combines data from the literature and key informant interviews with observations at health service delivery level in a study district. Current drug supply chain management in Uganda is characterized by parallel processes and information systems that result in poor quality and inefficiencies. Less than expected health system performance, stock outs and other shortages affect ART and primary care in general. Poor performance of supply chain management is amplified by weak conditions at all levels of the health system, including the areas of financing, governance, human resources and information. Governance issues include the lack to follow up initial policy intentions and a focus on narrow, short-term approaches. The opportunity and need to use ART investments for an essential supply chain management and strengthened health system has not been exploited. By applying a systems perspective this work indicates the seriousness of missing system prerequisites. The findings suggest that root causes and capacities across the system have to be addressed synergistically to enable systems that can match and accommodate investments in disease-specific interventions. The multiplicity and complexity of existing challenges require a long-term and systems perspective essentially in contrast to the current short term and program-specific nature of external assistance.

  18. Parallelism in Manipulator Dynamics. Revision.

    DTIC Science & Technology

    1983-12-01

    computing the motor torques required to drive a lower-pair kinematic chain (e.g., a typical manipulator arm in free motion, or a mechanical leg in the... computations , and presents two "mathematically exact" formulationsespecially suited to high-speed, highly parallel implementa- tions using special-purpose...YNAMICS by I(IIAR) IIAROLI) LATIROP .4ISTRACT This paper addresses the problem of efficiently computing the motor torques required to drive a lower-pair

  19. Partitioning of electron flux between the respiratory chains of the yeast Candida parapsilosis: parallel working of the two chains.

    PubMed

    Guerin, M G; Camougrand, N M

    1994-02-08

    Partitioning of the electron flux between the classical and the alternative respiratory chains of the yeast Candida parapsilosis, was measured as a function of the oxidation rate and of the Q-pool redox poise. At low respiration rate, electrons from external NADH travelled preferentially through the alternative pathway as indicated by the antimycin A-insensitivity of electron flow. Inhibition of the alternative pathway by SHAM restored full antimycin A-sensitivity to the remaining electro flow. The dependence of the respiratory rate on the redox poise of the quinone pool was investigated when the electron flux was mediated either by the main respiratory chain (growth in the absence of antimycin A) or by the second respiratory chain (growth in the presence of antimycin A). In the former case, a linear relationship was found between these two parameters. In contrast, in the latter case, the relationship between Q-pool reduction level and electron flux was non-linear, but it could be resolved into two distinct curves. This second quinone is not reducible in the presence of antimycin A but only in the presence of high concentrations of myxothiazol or cyanide. Since two quinone species exist in C. parapsilosis, UQ9 and Qx (C33H54O4), we hypothesized that these two curves could correspond to the functioning of the second quinone engaged during the alternative pathway activity. Partitioning of electrons between both respiratory chains could occur upstream of complex III with the second chain functioning in parallel to the main one, and with the additional possibility of merging into the main one at the complex IV level.

  20. High-throughput sequence alignment using Graphics Processing Units

    PubMed Central

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-01-01

    Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356

  1. Research in Parallel Computing: 1987-1990

    DTIC Science & Technology

    1994-08-05

    emulation, we layered UNIX BSD 4.3 functionality above the kernel primitives, but packaged both as a monolithic unit running in privileged state. This...further, so that only a "pure kernel " or " microkernel " runs in privileged mode, while the other components of the environment execute as one or more client... kernel DTIC TAB 24 2.2.2 Nectar’s communication software Unannounced 0 25 2.2.3 A Nectar programming interface Justification 25 2.3 System evaluation 26

  2. 6. Aerial view of turnpike alignment running from lower left ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. Aerial view of turnpike alignment running from lower left diagonally up to right along row of trees. Migel Estate and Farm buildings (HABS No. NY-6356) located at lower right of photograph. W.K. Smith house (HABS No. NY-6356-A) located within clump of trees at lower center, with poultry houses (HABS No. NY-6356-F and G) visible left of the clump of trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY

  3. 4. Aerial view of turnpike path running through center of ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    4. Aerial view of turnpike path running through center of photograph along row of trees. South edge of original alignment visible at left at cluster of white trailers. North edge of original alignment visible at right at the W.K. Smith house (HABS No. NY-6356-A) at the top right corner. Migel mansion visible on ridgetop at right-center of photograph, surrounded by trees. View looking west. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY

  4. CCC7-119 Reactive Molecular Dynamics Simulations of Hot Spot Growth in Shocked Energetic Materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, Aidan P.

    2015-03-01

    The purpose of this work is to understand how defects control initiation in energetic materials used in stockpile components; Sequoia gives us the core-count to run very large-scale simulations of up to 10 million atoms and; Using an OpenMP threaded implementation of the ReaxFF package in LAMMPS, we have been able to get good parallel efficiency running on 16k nodes of Sequoia, with 1 hardware thread per core.

  5. User’s Guide for the Coupled Ocean/Atmospheric Mesoscale Prediction System (COAMPS) Version 5.0

    DTIC Science & Technology

    2010-03-30

    provides tools for common modeling functions, as well as regridding, data decomposition, and communication on parallel computers. NRL/MR/7320...specified gncomDir. If running COAMPS at the DSRC (e.g. BABBAGE, DAVINCI , or EINSTEIN), the global NCOM files will be copied to /scr/[user]/COAMPS/data...the site (DSRC or local) and the platform (BABBAGE. DAVINCI , EINSTEIN, or local machine) on which COAMPS is being run. site=navy_dsrc (for DSRC

  6. Early Detection Of Failure Mechanisms In Resilient Biostructures: A Network Flow Study

    DTIC Science & Technology

    2017-10-01

    of flat blades of solid cartilage (sawfishes and some sharks) or simple tubes of bone (swordfish, marlin, etc.) and do not vary appreciably in size...cartilage The hard cartilage is formed by two flat sections that are almost parallel to each other and run along the longitudinal axis of the rostrum...rostrum subjected to a uniform pressure: soft cartilage The soft cartilage is located at the center of the rostrum and runs in the longitudinal Z

  7. XPOSE: the Exxon Nuclear revised LEOPARD

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skogen, F.B.

    1975-04-01

    Main differences between XPOSE and LEOPARD codes used to generate fast and thermal neutron spectra and cross sections are presented. Models used for fast and thermal spectrum calculations as well as the depletion calculations considering U-238 chain, U-235 chain, xenon and samarium, fission products and boron-10 are described. A detailed description of the input required to run XPOSE and a description of the output are included. (FS)

  8. The first quaternary lanthanide(III) nitride iodides: NaM{sub 4}N{sub 2}I{sub 7} (M=La-Nd)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schurz, Christian M.; Schleid, Thomas, E-mail: schleid@iac.uni-stuttgart.d

    In attempts to synthesize lanthanide(III) nitride iodides with the formula M{sub 2}NI{sub 3} (M=La-Nd), moisture-sensitive single crystals of the first quaternary sodium lanthanide(III) nitride iodides NaM{sub 4}N{sub 2}I{sub 7} (orthorhombic, Pna2{sub 1}; Z=4; a=1391-1401, b=1086-1094, c=1186-1211 pm) could be obtained. The dominating structural features are {sup 1}{sub {infinity}}{l_brace}[NM{sub 4/2}{sup e}]{sup 3+}{r_brace} chains of trans-edge linked [NM{sub 4}]{sup 9+} tetrahedra, which run parallel to the polar 2{sub 1}-axis [001]. Between the chains, direct bonding via special iodide anions generates cages, in which isolated [NaI{sub 6}]{sup 5-} octahedra are embedded. The IR spectrum of NaLa{sub 4}N{sub 2}I{sub 7} recorded from 100 tomore » 1000 cm{sup -1} shows main bands at {upsilon}=337, 373 and 489 cm{sup -1}. With decreasing radii of the lanthanide trications these bands, which can be assigned as an influence of the vibrations of the condensed [NM{sub 4}]{sup 9+} tetrahedra, are shifted toward higher frequencies for the NaM{sub 4}N{sub 2}I{sub 7} series (M=La-Nd), following the lanthanide contraction. - Abstract: View at the main structural features of the NaM{sub 4}N{sub 2}I{sub 7} series (M=La-Nd): The {sup 1}{sub {infinity}}{l_brace}[NM{sub 4/2}{sup e}]{sup 3+}{r_brace} chains, consisting of trans-edge connected [NM{sub 4}]{sup 9+} tetrahedra, and the special kind of iodide anions, namely (I7){sup -}, form cages, in which isolated [NaI{sub 6}]{sup 5-} octahedra are embedded.« less

  9. NAS Parallel Benchmarks. 2.4

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.

  10. Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, Terry R

    2011-01-01

    This paper describes a kernel scheduling algorithm that is based on co-scheduling principles and that is intended for parallel applications running on 1000 cores or more where inter-node scalability is key. Experimental results for a Linux implementation on a Cray XT5 machine are presented.1 The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.

  11. Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach

    NASA Technical Reports Server (NTRS)

    Mak, Victor W. K.

    1986-01-01

    Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.

  12. Secure web-based invocation of large-scale plasma simulation codes

    NASA Astrophysics Data System (ADS)

    Dimitrov, D. A.; Busby, R.; Exby, J.; Bruhwiler, D. L.; Cary, J. R.

    2004-12-01

    We present our design and initial implementation of a web-based system for running, both in parallel and serial, Particle-In-Cell (PIC) codes for plasma simulations with automatic post processing and generation of visual diagnostics.

  13. 78 FR 9687 - Prineville Energy Storage, LLC; Notice of Preliminary Permit Application Accepted for Filing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-11

    ... PDCI to the Ponderosa substation, or (ii) the Bonneville Power Administration (BPA) existing transmission line corridor and then running parallel to the BPA line to the Ponderosa substation; and (9...

  14. Implementation of a fully-balanced periodic tridiagonal solver on a parallel distributed memory architecture

    NASA Technical Reports Server (NTRS)

    Eidson, T. M.; Erlebacher, G.

    1994-01-01

    While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.

  15. A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

    PubMed

    Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

    2002-02-01

    This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.

  16. Porting Ordinary Applications to Blue Gene/Q Supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maheshwari, Ketan C.; Wozniak, Justin M.; Armstrong, Timothy

    2015-08-31

    Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques-sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt'smore » sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.« less

  17. Theory and implementation of a very high throughput true random number generator in field programmable gate array

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong

    The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less

  18. Theory and implementation of a very high throughput true random number generator in field programmable gate array.

    PubMed

    Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao

    2016-04-01

    The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.

  19. Development and validation of a multiplex real-time PCR method to simultaneously detect 47 targets for the identification of genetically modified organisms.

    PubMed

    Cottenet, Geoffrey; Blancpain, Carine; Sonnard, Véronique; Chuah, Poh Fong

    2013-08-01

    Considering the increase of the total cultivated land area dedicated to genetically modified organisms (GMO), the consumers' perception toward GMO and the need to comply with various local GMO legislations, efficient and accurate analytical methods are needed for their detection and identification. Considered as the gold standard for GMO analysis, the real-time polymerase chain reaction (RTi-PCR) technology was optimised to produce a high-throughput GMO screening method. Based on simultaneous 24 multiplex RTi-PCR running on a ready-to-use 384-well plate, this new procedure allows the detection and identification of 47 targets on seven samples in duplicate. To comply with GMO analytical quality requirements, a negative and a positive control were analysed in parallel. In addition, an internal positive control was also included in each reaction well for the detection of potential PCR inhibition. Tested on non-GM materials, on different GM events and on proficiency test samples, the method offered high specificity and sensitivity with an absolute limit of detection between 1 and 16 copies depending on the target. Easy to use, fast and cost efficient, this multiplex approach fits the purpose of GMO testing laboratories.

  20. Drastic stabilization of parallel DNA hybridizations by a polylysine comb-type copolymer with hydrophilic graft chain.

    PubMed

    Miyoshi, Daisuke; Ueda, Yu-Mi; Shimada, Naohiko; Nakano, Shu-Ichi; Sugimoto, Naoki; Maruyama, Atsushi

    2014-09-01

    Electrostatic interactions play a major role in protein-DNA interactions. As a model system of a cationic protein, herein we focused on a comb-type copolymer of a polycation backbone and dextran side chains, poly(L-lysine)-graft-dextran (PLL-g-Dex), which has been reported to form soluble interpolyelectrolyte complexes with DNA strands. We investigated the effects of PLL-g-Dex on the conformation and thermodynamics of DNA oligonucleotides forming various secondary structures. Thermodynamic analysis of the DNA structures showed that the parallel conformations involved in both DNA duplexes and triplexes were significantly and specifically stabilized by PLL-g-Dex. On the basis of thermodynamic parameters, it was further possible to design DNA switches that undergo structural transition responding to PLL-g-Dex from an antiparallel duplex to a parallel triplex even with mismatches in the third strand hybridization. These results suggest that polycationic molecules are able to induce structural polymorphism of DNA oligonucleotides, because of the conformation-selective stabilization effects. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Room-temperature synthesis of two-dimensional ultrathin gold nanowire parallel array with tunable spacing.

    PubMed

    Morita, Clara; Tanuma, Hiromitsu; Kawai, Chika; Ito, Yuki; Imura, Yoshiro; Kawai, Takeshi

    2013-02-05

    A series of long-chain amidoamine derivatives with different alkyl chain lengths (CnAA where n is 12, 14, 16, or 18) were synthesized and studied with regard to their ability to form organogels and to act as soft templates for the production of Au nanomaterials. These compounds were found to self-assemble into lamellar structures and exhibited gelation ability in some apolar solvents. The gelation concentration, gel-sol phase transition temperature, and lattice spacing of the lamellar structures in organic solvent all varied on the basis of the alkyl chain length of the particular CnAA compound employed. The potential for these molecules to function as templates was evaluated through the synthesis of Au nanowires (NWs) in their organogels. Ultrathin Au NWs were obtained from all CnAA/toluene gel systems, each within an optimal temperature range. Interestingly, in the case of C12AA and C14AA, it was possible to fabricate ultrathin Au NWs at room temperature. In addition, two-dimensional parallel arrays of ultrathin Au NWs were self-assembled onto TEM copper grids as a result of the drying of dispersion solutions of these NWs. The use of CnAA compounds with differing alkyl chain lengths enabled precise tuning of the distance between the Au NWs in these arrays.

  2. 30 CFR 77.1906 - Hoists; daily inspection.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... wheels, etc.), connections, links and chains, and other facilities. (b) Prior to each working shift, and... shall be run by the hoist operator through one complete cycle of operation before any person is...

  3. Reciprocal Sliding Friction Model for an Electro-Deposited Coating and Its Parameter Estimation Using Markov Chain Monte Carlo Method

    PubMed Central

    Kim, Kyungmok; Lee, Jaewook

    2016-01-01

    This paper describes a sliding friction model for an electro-deposited coating. Reciprocating sliding tests using ball-on-flat plate test apparatus are performed to determine an evolution of the kinetic friction coefficient. The evolution of the friction coefficient is classified into the initial running-in period, steady-state sliding, and transition to higher friction. The friction coefficient during the initial running-in period and steady-state sliding is expressed as a simple linear function. The friction coefficient in the transition to higher friction is described with a mathematical model derived from Kachanov-type damage law. The model parameters are then estimated using the Markov Chain Monte Carlo (MCMC) approach. It is identified that estimated friction coefficients obtained by MCMC approach are in good agreement with measured ones. PMID:28773359

  4. Optimizing the way kinematical feed chains with great distance between slides are chosen for CNC machine tools

    NASA Astrophysics Data System (ADS)

    Lucian, P.; Gheorghe, S.

    2017-08-01

    This paper presents a new method, based on FRISCO formula, for optimizing the choice of the best control system for kinematical feed chains with great distance between slides used in computer numerical controlled machine tools. Such machines are usually, but not limited to, used for machining large and complex parts (mostly in the aviation industry) or complex casting molds. For such machine tools the kinematic feed chains are arranged in a dual-parallel drive structure that allows the mobile element to be moved by the two kinematical branches and their related control systems. Such an arrangement allows for high speed and high rigidity (a critical requirement for precision machining) during the machining process. A significant issue for such an arrangement it’s the ability of the two parallel control systems to follow the same trajectory accurately in order to address this issue it is necessary to achieve synchronous motion control for the two kinematical branches ensuring that the correct perpendicular position it’s kept by the mobile element during its motion on the two slides.

  5. Transmittance tuning by particle chain polarization in electrowetting-driven droplets

    PubMed Central

    Fan, Shih-Kang; Chiu, Cheng-Pu; Huang, Po-Wen

    2010-01-01

    A tiny droplet containing nano∕microparticles commonly handled in digital microfluidic lab-on-a-chip is regarded as a micro-optical component with tunable transmittance at programmable positions for the application of micro-opto-fluidic-systems. Cross-scale electric manipulations of droplets on a millimeter scale as well as suspended particles on a micrometer scale are demonstrated by electrowetting-on-dielectric (EWOD) and particle chain polarization, respectively. By applying electric fields at proper frequency ranges, EWOD and polarization can be selectively achieved in designed and fabricated parallel plate devices. At low frequencies, the applied signal generates EWOD to pump suspension droplets. The evenly dispersed particles reflect and∕or absorb the incident light to exhibit a reflective or dark droplet. When sufficiently high frequencies are used on to the nonsegmented parallel electrodes, a uniform electric field is established across the liquid to polarize the dispersed neutral particles. The induced dipole moments attract the particles each other to form particle chains and increase the transmittance of the suspension, demonstrating a transmissive or bright droplet. In addition, the reflectance of the droplet is measured at various frequencies with different amplitudes. PMID:21267088

  6. Parallel matrix multiplication on the Connection Machine

    NASA Technical Reports Server (NTRS)

    Tichy, Walter F.

    1988-01-01

    Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n to the 2nd power log n), O(n log n), O(n), and O(log n), and require n, n to the 2nd power, n to the 2nd power, and n to the 3rd power processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.

  7. Parallel simulations of Grover's algorithm for closest match search in neutron monitor data

    NASA Astrophysics Data System (ADS)

    Kussainov, Arman; White, Yelena

    We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.

  8. Multitasking domain decomposition fast Poisson solvers on the Cray Y-MP

    NASA Technical Reports Server (NTRS)

    Chan, Tony F.; Fatoohi, Rod A.

    1990-01-01

    The results of multitasking implementation of a domain decomposition fast Poisson solver on eight processors of the Cray Y-MP are presented. The object of this research is to study the performance of domain decomposition methods on a Cray supercomputer and to analyze the performance of different multitasking techniques using highly parallel algorithms. Two implementations of multitasking are considered: macrotasking (parallelism at the subroutine level) and microtasking (parallelism at the do-loop level). A conventional FFT-based fast Poisson solver is also multitasked. The results of different implementations are compared and analyzed. A speedup of over 7.4 on the Cray Y-MP running in a dedicated environment is achieved for all cases.

  9. Eigensolver for a Sparse, Large Hermitian Matrix

    NASA Technical Reports Server (NTRS)

    Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

    2003-01-01

    A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boman, Erik G.

    This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less

  11. Stability of vertical magnetic chains

    PubMed Central

    2017-01-01

    A linear stability analysis is performed for a pair of coaxial vertical chains made from permanently magnetized balls under the influence of gravity. While one chain rises from the ground, the other hangs from above, with the remaining ends separated by a gap of prescribed length. Various boundary conditions are considered, as are situations in which the magnetic dipole moments in the two chains are parallel or antiparallel. The case of a single chain attached to the ground is also discussed. The stability of the system is examined with respect to three quantities: the number of balls in each chain, the length of the gap between the chains, and a single dimensionless parameter which embodies the competition between magnetic and gravitational forces. Asymptotic scaling laws involving these parameters are provided. The Hessian matrix is computed in exact form, allowing the critical parameter values at which the system loses stability and the respective eigenmodes to be determined up to machine precision. A comparison with simple experiments for a single chain attached to the ground shows good agreement. PMID:28293135

  12. Stability of vertical magnetic chains

    NASA Astrophysics Data System (ADS)

    Schönke, Johannes; Fried, Eliot

    2017-02-01

    A linear stability analysis is performed for a pair of coaxial vertical chains made from permanently magnetized balls under the influence of gravity. While one chain rises from the ground, the other hangs from above, with the remaining ends separated by a gap of prescribed length. Various boundary conditions are considered, as are situations in which the magnetic dipole moments in the two chains are parallel or antiparallel. The case of a single chain attached to the ground is also discussed. The stability of the system is examined with respect to three quantities: the number of balls in each chain, the length of the gap between the chains, and a single dimensionless parameter which embodies the competition between magnetic and gravitational forces. Asymptotic scaling laws involving these parameters are provided. The Hessian matrix is computed in exact form, allowing the critical parameter values at which the system loses stability and the respective eigenmodes to be determined up to machine precision. A comparison with simple experiments for a single chain attached to the ground shows good agreement.

  13. Alpine Fault, New Zealand, SRTM Shaded Relief and Colored Height

    NASA Image and Video Library

    2005-01-06

    The Alpine fault runs parallel to, and just inland of, much of the west coast of New Zealand South Island. This view was created from the near-global digital elevation model produced by NASA Shuttle Radar Topography Mission SRTM.

  14. Thread concept for automatic task parallelization in image analysis

    NASA Astrophysics Data System (ADS)

    Lueckenhaus, Maximilian; Eckstein, Wolfgang

    1998-09-01

    Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.

  15. Evolution of CMS workload management towards multicore job support

    NASA Astrophysics Data System (ADS)

    Pérez-Calero Yzquierdo, A.; Hernández, J. M.; Khan, F. A.; Letts, J.; Majewski, K.; Rodrigues, A. M.; McCrea, A.; Vaandering, E.

    2015-12-01

    The successful exploitation of multicore processor architectures is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework is introducing the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute multicore jobs while still supporting singlecore processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management thus aims at integrating single and multicore job scheduling across the Grid. This is accomplished by employing multicore pilots with internal dynamic partitioning of the allocated resources, capable of running payloads of various core counts simultaneously. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites, with the focus on the Tier-0 and Tier-1s, responsible during 2015 of the prompt data reconstruction. Scale tests have been run to analyse the performance of this scheduling strategy and ensure an efficient use of the distributed resources. This paper presents the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its deployment and performance tests, which will enable CMS to transition to a multicore production model for the second LHC run.

  16. Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

    PubMed Central

    Madhyastha, Tara M.; Koh, Natalie; Day, Trevor K. M.; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J.; Rajan, Sabreena; Woelfer, Karl A.; Wolf, Jonathan; Grabowski, Thomas J.

    2017-01-01

    The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows “in the cloud.” Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster. PMID:29163119

  17. Evolution of CMS Workload Management Towards Multicore Job Support

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perez-Calero Yzquierdo, A.; Hernández, J. M.; Khan, F. A.

    The successful exploitation of multicore processor architectures is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework is introducing the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute multicore jobs while still supporting singlecore processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management thus aims at integrating single andmore » multicore job scheduling across the Grid. This is accomplished by employing multicore pilots with internal dynamic partitioning of the allocated resources, capable of running payloads of various core counts simultaneously. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites, with the focus on the Tier-0 and Tier-1s, responsible during 2015 of the prompt data reconstruction. Scale tests have been run to analyse the performance of this scheduling strategy and ensure an efficient use of the distributed resources. This paper presents the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its deployment and performance tests, which will enable CMS to transition to a multicore production model for the second LHC run.« less

  18. Parallel algorithm of VLBI software correlator under multiprocessor environment

    NASA Astrophysics Data System (ADS)

    Zheng, Weimin; Zhang, Dong

    2007-11-01

    The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.

  19. MC3: Multi-core Markov-chain Monte Carlo code

    NASA Astrophysics Data System (ADS)

    Cubillos, Patricio; Harrington, Joseph; Lust, Nate; Foster, AJ; Stemm, Madison; Loredo, Tom; Stevenson, Kevin; Campo, Chris; Hardin, Matt; Hardy, Ryan

    2016-10-01

    MC3 (Multi-core Markov-chain Monte Carlo) is a Bayesian statistics tool that can be executed from the shell prompt or interactively through the Python interpreter with single- or multiple-CPU parallel computing. It offers Markov-chain Monte Carlo (MCMC) posterior-distribution sampling for several algorithms, Levenberg-Marquardt least-squares optimization, and uniform non-informative, Jeffreys non-informative, or Gaussian-informative priors. MC3 can share the same value among multiple parameters and fix the value of parameters to constant values, and offers Gelman-Rubin convergence testing and correlated-noise estimation with time-averaging or wavelet-based likelihood estimation methods.

  20. Final Scientific Report: A Scalable Development Environment for Peta-Scale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karbach, Carsten; Frings, Wolfgang

    2013-02-22

    This document is the final scientific report of the project DE-SC000120 (A scalable Development Environment for Peta-Scale Computing). The objective of this project is the extension of the Parallel Tools Platform (PTP) for applying it to peta-scale systems. PTP is an integrated development environment for parallel applications. It comprises code analysis, performance tuning, parallel debugging and system monitoring. The contribution of the Juelich Supercomputing Centre (JSC) aims to provide a scalable solution for system monitoring of supercomputers. This includes the development of a new communication protocol for exchanging status data between the target remote system and the client running PTP.more » The communication has to work for high latency. PTP needs to be implemented robustly and should hide the complexity of the supercomputer's architecture in order to provide a transparent access to various remote systems via a uniform user interface. This simplifies the porting of applications to different systems, because PTP functions as abstraction layer between parallel application developer and compute resources. The common requirement for all PTP components is that they have to interact with the remote supercomputer. E.g. applications are built remotely and performance tools are attached to job submissions and their output data resides on the remote system. Status data has to be collected by evaluating outputs of the remote job scheduler and the parallel debugger needs to control an application executed on the supercomputer. The challenge is to provide this functionality for peta-scale systems in real-time. The client server architecture of the established monitoring application LLview, developed by the JSC, can be applied to PTP's system monitoring. LLview provides a well-arranged overview of the supercomputer's current status. A set of statistics, a list of running and queued jobs as well as a node display mapping running jobs to their compute resources form the user display of LLview. These monitoring features have to be integrated into the development environment. Besides showing the current status PTP's monitoring also needs to allow for submitting and canceling user jobs. Monitoring peta-scale systems especially deals with presenting the large amount of status data in a useful manner. Users require to select arbitrary levels of detail. The monitoring views have to provide a quick overview of the system state, but also need to allow for zooming into specific parts of the system, into which the user is interested in. At present, the major batch systems running on supercomputers are PBS, TORQUE, ALPS and LoadLeveler, which have to be supported by both the monitoring and the job controlling component. Finally, PTP needs to be designed as generic as possible, so that it can be extended for future batch systems.« less

  1. Catabolism of Branched Chain Amino Acids Contributes Significantly to Synthesis of Odd-Chain and Even-Chain Fatty Acids in 3T3-L1 Adipocytes.

    PubMed

    Crown, Scott B; Marze, Nicholas; Antoniewicz, Maciek R

    2015-01-01

    The branched chain amino acids (BCAA) valine, leucine and isoleucine have been implicated in a number of diseases including obesity, insulin resistance, and type 2 diabetes mellitus, although the mechanisms are still poorly understood. Adipose tissue plays an important role in BCAA homeostasis by actively metabolizing circulating BCAA. In this work, we have investigated the link between BCAA catabolism and fatty acid synthesis in 3T3-L1 adipocytes using parallel 13C-labeling experiments, mass spectrometry and model-based isotopomer data analysis. Specifically, we performed parallel labeling experiments with four fully 13C-labeled tracers, [U-13C]valine, [U-13C]leucine, [U-13C]isoleucine and [U-13C]glutamine. We measured mass isotopomer distributions of fatty acids and intracellular metabolites by GC-MS and analyzed the data using the isotopomer spectral analysis (ISA) framework. We demonstrate that 3T3-L1 adipocytes accumulate significant amounts of even chain length (C14:0, C16:0 and C18:0) and odd chain length (C15:0 and C17:0) fatty acids under standard cell culture conditions. Using a novel GC-MS method, we demonstrate that propionyl-CoA acts as the primer on fatty acid synthase for the production of odd chain fatty acids. BCAA contributed significantly to the production of all fatty acids. Leucine and isoleucine contributed at least 25% to lipogenic acetyl-CoA pool, and valine and isoleucine contributed 100% to lipogenic propionyl-CoA pool. Our results further suggest that low activity of methylmalonyl-CoA mutase and mass action kinetics of propionyl-CoA on fatty acid synthase result in high rates of odd chain fatty acid synthesis in 3T3-L1 cells. Overall, this work provides important new insights into the connection between BCAA catabolism and fatty acid synthesis in adipocytes and underscores the high capacity of adipocytes for metabolizing BCAA.

  2. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations

    NASA Astrophysics Data System (ADS)

    Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.

    2010-09-01

    The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.

  3. Two halide-containing cesium manganese vanadates: synthesis, characterization, and magnetic properties

    DOE PAGES

    Smith Pellizzeri, Tiffany M.; McGuire, Michael A.; McMillen, Colin D.; ...

    2018-01-24

    In this study, two new halide-containing cesium manganese vanadates have been synthesized by a high-temperature (580 °C) hydrothermal synthetic method from aqueous brine solutions. One compound, Cs 3Mn(VO 3) 4Cl, (1) was prepared using a mixed cesium hydroxide/chloride mineralizer, and crystallizes in the polar noncentrosymmetric space group Cmm2, with a = 16.7820(8) Å, b = 8.4765(4) Å, c = 5.7867(3) Å. This structure is built from sinusoidal zig-zag (VO 3) n chains that run along the b-axis and are coordinated to Mn 2+ containing (MnO 4Cl) square-pyramidal units that are linked together to form layers. The cesium cations reside betweenmore » the layers, but also coordinate to the chloride ion, forming a cesium chloride chain that also propagates along the b-axis. The other compound, Cs 2Mn(VO 3) 3F, (2) crystallizes in space group Pbca with a = 7.4286(2) Å, b = 15.0175(5) Å, c = 19.6957(7) Å, and was prepared using a cesium fluoride mineralizer. The structure is comprised of corner sharing octahedral Mn 2+ chains, with trans fluoride ligands acting as bridging units, whose ends are capped by (VO 3) n vanadate chains to form slabs. The cesium atoms reside between the manganese vanadate layers, and also play an integral part in the structure, forming a cesium fluoride chain that runs along the b-axis. Both compounds were characterized by single-crystal X-ray diffraction, powder X-ray diffraction, and single-crystal Raman spectroscopy. Additionally, the magnetic properties of 2 were investigated. Lastly, above 50 K, it displays behavior typical of a low dimensional system with antiferromagnetic interactions, as to be expected for linear chains of manganese(II) within the crystal structure.« less

  4. Fenix, A Fault Tolerant Programming Framework for MPI Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gamel, Marc; Teranihi, Keita; Valenzuela, Eric

    2016-10-05

    Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.

  5. Data intensive computing at Sandia.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wilson, Andrew T.

    2010-09-01

    Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.

  6. PARAMO: A Parallel Predictive Modeling Platform for Healthcare Analytic Research using Electronic Health Records

    PubMed Central

    Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R.; Stewart, Walter F.; Malin, Bradley; Sun, Jimeng

    2014-01-01

    Objective Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: 1) cohort construction, 2) feature construction, 3) cross-validation, 4) feature selection, and 5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. Methods To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which 1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, 2) schedules the tasks in a topological ordering of the graph, and 3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. Results We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3 hours in parallel compared to 9 days if running sequentially. Conclusion This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. PMID:24370496

  7. PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.

    PubMed

    Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R; Stewart, Walter F; Malin, Bradley; Sun, Jimeng

    2014-04-01

    Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially. This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Architecture Adaptive Computing Environment

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    2006-01-01

    Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.

  9. The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code.

    PubMed

    Kunkel, Susanne; Schenck, Wolfram

    2017-01-01

    NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling.

  10. The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code

    PubMed Central

    Kunkel, Susanne; Schenck, Wolfram

    2017-01-01

    NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling. PMID:28701946

  11. Visual Computing Environment

    NASA Technical Reports Server (NTRS)

    Lawrence, Charles; Putt, Charles W.

    1997-01-01

    The Visual Computing Environment (VCE) is a NASA Lewis Research Center project to develop a framework for intercomponent and multidisciplinary computational simulations. Many current engineering analysis codes simulate various aspects of aircraft engine operation. For example, existing computational fluid dynamics (CFD) codes can model the airflow through individual engine components such as the inlet, compressor, combustor, turbine, or nozzle. Currently, these codes are run in isolation, making intercomponent and complete system simulations very difficult to perform. In addition, management and utilization of these engineering codes for coupled component simulations is a complex, laborious task, requiring substantial experience and effort. To facilitate multicomponent aircraft engine analysis, the CFD Research Corporation (CFDRC) is developing the VCE system. This system, which is part of NASA's Numerical Propulsion Simulation System (NPSS) program, can couple various engineering disciplines, such as CFD, structural analysis, and thermal analysis. The objectives of VCE are to (1) develop a visual computing environment for controlling the execution of individual simulation codes that are running in parallel and are distributed on heterogeneous host machines in a networked environment, (2) develop numerical coupling algorithms for interchanging boundary conditions between codes with arbitrary grid matching and different levels of dimensionality, (3) provide a graphical interface for simulation setup and control, and (4) provide tools for online visualization and plotting. VCE was designed to provide a distributed, object-oriented environment. Mechanisms are provided for creating and manipulating objects, such as grids, boundary conditions, and solution data. This environment includes parallel virtual machine (PVM) for distributed processing. Users can interactively select and couple any set of codes that have been modified to run in a parallel distributed fashion on a cluster of heterogeneous workstations. A scripting facility allows users to dictate the sequence of events that make up the particular simulation.

  12. PeakRanger: A cloud-enabled peak caller for ChIP-seq data

    PubMed Central

    2011-01-01

    Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709

  13. Limits to high-speed simulations of spiking neural networks using general-purpose computers.

    PubMed

    Zenke, Friedemann; Gerstner, Wulfram

    2014-01-01

    To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.

  14. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  15. VINE-A NUMERICAL CODE FOR SIMULATING ASTROPHYSICAL SYSTEMS USING PARTICLES. II. IMPLEMENTATION AND PERFORMANCE CHARACTERISTICS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nelson, Andrew F.; Wetzstein, M.; Naab, T.

    2009-10-01

    We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the codemore » necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE. Finally, we find that although parallel performance on small problems may reach a plateau beyond which more processors bring no additional speedup, performance never decreases, a factor important for running large simulations on many processors with individual time steps, where only a small fraction of the total particles require updates at any given moment.« less

  16. Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Xuanhua; Luo, Xuan; Liang, Junling

    GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weightmore » asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.« less

  17. Scalable Metropolis Monte Carlo for simulation of hard shapes

    NASA Astrophysics Data System (ADS)

    Anderson, Joshua A.; Eric Irrgang, M.; Glotzer, Sharon C.

    2016-07-01

    We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel-Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.

  18. A Comparison of Hybrid Reynolds Averaged Navier Stokes/Large Eddy Simulation (RANS/LES) and Unsteady RANS Predictions of Separated Flow for a Variable Speed Power Turbine Blade Operating with Low Inlet Turbulence Levels

    DTIC Science & Technology

    2017-10-01

    Facility is a large-scale cascade that allows detailed flow field surveys and blade surface measurements.10–12 The facility has a continuous run ...structured grids at 2 flow conditions, cruise and takeoff, of the VSPT blade . Computations were run in parallel on a Department of Defense...RANS/LES) and Unsteady RANS Predictions of Separated Flow for a Variable-Speed Power- Turbine Blade Operating with Low Inlet Turbulence Levels

  19. Molecular-dynamics simulations of self-assembled monolayers (SAM) on parallel computers

    NASA Astrophysics Data System (ADS)

    Vemparala, Satyavani

    The purpose of this dissertation is to investigate the properties of self-assembled monolayers, particularly alkanethiols and Poly (ethylene glycol) terminated alkanethiols. These simulations are based on realistic interatomic potentials and require scalable and portable multiresolution algorithms implemented on parallel computers. Large-scale molecular dynamics simulations of self-assembled alkanethiol monolayer systems have been carried out using an all-atom model involving a million atoms to investigate their structural properties as a function of temperature, lattice spacing and molecular chain-length. Results show that the alkanethiol chains tilt from the surface normal by a collective angle of 25° along next-nearest neighbor direction at 300K. At 350K the system transforms to a disordered phase characterized by small tilt angle, flexible tilt direction, and random distribution of backbone planes. With increasing lattice spacing, a, the tilt angle increases rapidly from a nearly zero value at a = 4.7A to as high as 34° at a = 5.3A at 300K. We also studied the effect of end groups on the tilt structure of SAM films. We characterized the system with respect to temperature, the alkane chain length, lattice spacing, and the length of the end group. We found that the gauche defects were predominant only in the tails, and the gauche defects increased with the temperature and number of EG units. Effect of electric field on the structure of poly (ethylene glycol) (PEG) terminated alkanethiol self assembled monolayer (SAM) on gold has been studied using parallel molecular dynamics method. An applied electric field triggers a conformational transition from all-trans to a mostly gauche conformation. The polarity of the electric field has a significant effect on the surface structure of PEG leading to a profound effect on the hydrophilicity of the surface. The electric field applied anti-parallel to the surface normal causes a reversible transition to an ordered state in which the oxygen atoms are exposed. On the other hand, an electric field applied in a direction parallel to the surface normal introduces considerable disorder in the system and the oxygen atoms are buried inside.

  20. Concentration and saturation effects of tethered polymer chains on adsorbing surfaces

    NASA Astrophysics Data System (ADS)

    Descas, Radu; Sommer, Jens-Uwe; Blumen, Alexander

    2006-12-01

    We consider end-grafted chains at an adsorbing surface under good solvent conditions using Monte Carlo simulations and scaling arguments. Grafting of chains allows us to fix the surface concentration and to study a wide range of surface concentrations from the undersaturated state of the surface up to the brushlike regime. The average extension of single chains in the direction parallel and perpendicular to the surface is analyzed using scaling arguments for the two-dimensional semidilute surface state according to Bouchaud and Daoud [J. Phys. (Paris) 48, 1991 (1987)]. We find good agreement with the scaling predictions for the scaling in the direction parallel to the surface and for surface concentrations much below the saturation concentration (dense packing of adsorption blobs). Increasing the grafting density we study the saturation effects and the oversaturation of the adsorption layer. In order to account for the effect of excluded volume on the adsorption free energy we introduce a new scaling variable related with the saturation concentration of the adsorption layer (saturation scaling). We show that the decrease of the single chain order parameter (the fraction of adsorbed monomers on the surface) with increasing concentration, being constant in the ideal semidilute surface state, is properly described by saturation scaling only. Furthermore, the simulation results for the chains' extension from higher surface concentrations up to the oversaturated state support the new scaling approach. The oversaturated state can be understood using a geometrical model which assumes a brushlike layer on top of a saturated adsorption layer. We provide evidence that adsorbed polymer layers are very sensitive to saturation effects, which start to influence the semidilute surface scaling even much below the saturation threshold.

  1. Position Paper - pFLogger: The Parallel Fortran Logging framework for HPC Applications

    NASA Technical Reports Server (NTRS)

    Clune, Thomas L.; Cruz, Carlos A.

    2017-01-01

    In the context of high performance computing (HPC), software investments in support of text-based diagnostics, which monitor a running application, are typically limited compared to those for other types of IO. Examples of such diagnostics include reiteration of configuration parameters, progress indicators, simple metrics (e.g., mass conservation, convergence of solvers, etc.), and timers. To some degree, this difference in priority is justifiable as other forms of output are the primary products of a scientific model and, due to their large data volume, much more likely to be a significant performance concern. In contrast, text-based diagnostic content is generally not shared beyond the individual or group running an application and is most often used to troubleshoot when something goes wrong. We suggest that a more systematic approach enabled by a logging facility (or logger) similar to those routinely used by many communities would provide significant value to complex scientific applications. In the context of high-performance computing, an appropriate logger would provide specialized support for distributed and shared-memory parallelism and have low performance overhead. In this paper, we present our prototype implementation of pFlogger a parallel Fortran-based logging framework, and assess its suitability for use in a complex scientific application.

  2. POSITION PAPER - pFLogger: The Parallel Fortran Logging Framework for HPC Applications

    NASA Technical Reports Server (NTRS)

    Clune, Thomas L.; Cruz, Carlos A.

    2017-01-01

    In the context of high performance computing (HPC), software investments in support of text-based diagnostics, which monitor a running application, are typically limited compared to those for other types of IO. Examples of such diagnostics include reiteration of configuration parameters, progress indicators, simple metrics (e.g., mass conservation, convergence of solvers, etc.), and timers. To some degree, this difference in priority is justifiable as other forms of output are the primary products of a scientific model and, due to their large data volume, much more likely to be a significant performance concern. In contrast, text-based diagnostic content is generally not shared beyond the individual or group running an application and is most often used to troubleshoot when something goes wrong. We suggest that a more systematic approach enabled by a logging facility (or 'logger') similar to those routinely used by many communities would provide significant value to complex scientific applications. In the context of high-performance computing, an appropriate logger would provide specialized support for distributed and shared-memory parallelism and have low performance overhead. In this paper, we present our prototype implementation of pFlogger - a parallel Fortran-based logging framework, and assess its suitability for use in a complex scientific application.

  3. Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Turney, Raymond D.

    2001-01-01

    This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.

  4. Multi-Resolution Climate Ensemble Parameter Analysis with Nested Parallel Coordinates Plots.

    PubMed

    Wang, Junpeng; Liu, Xiaotong; Shen, Han-Wei; Lin, Guang

    2017-01-01

    Due to the uncertain nature of weather prediction, climate simulations are usually performed multiple times with different spatial resolutions. The outputs of simulations are multi-resolution spatial temporal ensembles. Each simulation run uses a unique set of values for multiple convective parameters. Distinct parameter settings from different simulation runs in different resolutions constitute a multi-resolution high-dimensional parameter space. Understanding the correlation between the different convective parameters, and establishing a connection between the parameter settings and the ensemble outputs are crucial to domain scientists. The multi-resolution high-dimensional parameter space, however, presents a unique challenge to the existing correlation visualization techniques. We present Nested Parallel Coordinates Plot (NPCP), a new type of parallel coordinates plots that enables visualization of intra-resolution and inter-resolution parameter correlations. With flexible user control, NPCP integrates superimposition, juxtaposition and explicit encodings in a single view for comparative data visualization and analysis. We develop an integrated visual analytics system to help domain scientists understand the connection between multi-resolution convective parameters and the large spatial temporal ensembles. Our system presents intricate climate ensembles with a comprehensive overview and on-demand geographic details. We demonstrate NPCP, along with the climate ensemble visualization system, based on real-world use-cases from our collaborators in computational and predictive science.

  5. Tough2{_}MP: A parallel version of TOUGH2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris

    2003-04-09

    TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less

  6. Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

    1997-01-01

    In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.

  7. Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

    NASA Astrophysics Data System (ADS)

    Yan, Hui; Wang, K. G.; Jones, Jim E.

    2016-06-01

    A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.

  8. Parallel VLSI architecture emulation and the organization of APSA/MPP

    NASA Technical Reports Server (NTRS)

    Odonnell, John T.

    1987-01-01

    The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.

  9. Running With an Elastic Lower Limb Exoskeleton.

    PubMed

    Cherry, Michael S; Kota, Sridhar; Young, Aaron; Ferris, Daniel P

    2016-06-01

    Although there have been many lower limb robotic exoskeletons that have been tested for human walking, few devices have been tested for assisting running. It is possible that a pseudo-passive elastic exoskeleton could benefit human running without the addition of electrical motors due to the spring-like behavior of the human leg. We developed an elastic lower limb exoskeleton that added stiffness in parallel with the entire lower limb. Six healthy, young subjects ran on a treadmill at 2.3 m/s with and without the exoskeleton. Although the exoskeleton was designed to provide ~50% of normal leg stiffness during running, it only provided 24% of leg stiffness during testing. The difference in added leg stiffness was primarily due to soft tissue compression and harness compliance decreasing exoskeleton displacement during stance. As a result, the exoskeleton only supported about 7% of the peak vertical ground reaction force. There was a significant increase in metabolic cost when running with the exoskeleton compared with running without the exoskeleton (ANOVA, P < .01). We conclude that 2 major roadblocks to designing successful lower limb robotic exoskeletons for human running are human-machine interface compliance and the extra lower limb inertia from the exoskeleton.

  10. A novel open-framework with non-crossing channels in the uranyl vanadates A(UO 2) 4(VO 4) 3 ( A=Li, Na)

    NASA Astrophysics Data System (ADS)

    Obbade, S.; Dion, C.; Rivenet, M.; Saadi, M.; Abraham, F.

    2004-06-01

    A new sodium uranyl vanadate Na(UO 2) 4(VO 4) 3 has been synthesized by solid-state reaction and its structure determined from single-crystal X-ray diffraction data. It crystallizes in the tetragonal symmetry with space group I4 1/ amd and following cell parameters: a=7.2267(4) Å and c=34.079(4) Å, V=1779.8(2) Å 3, Z=4 with ρmes=5.36(3) g/cm 3 and ρcal=5.40(2) g/cm 3. A full-matrix least-squares refinement on the basis of F2 yielded R1=0.028 and w R2=0.056 for 52 parameters with 474 independent reflections with I⩾2 σ( I) collected on a BRUKER AXS diffractometer with Mo Kα radiation and a CCD detector. The crystal structure is characterized by ∞2[(UO 2) 2(VO 4)] sheets parallel to (001) formed by corner-shared UO 6 distorted octahedra and V(2)O 4 tetrahedra, connected by V(1)O 4 tetrahedra to ∞1[UO 5] 4- chains of edge-shared UO 7 pentagonal bipyramids alternately parallel to the a- and b-axis. The resulting three-dimensional framework creates mono-dimensional channels running down the a- and b-axis formed by face-shared oxygen octahedra half occupied by Na. The powder of Li analog compound Li(UO 2) 4(VO 4) 3 has been synthesized by solid-state reaction. The two compounds exhibit high mobility of the alkaline ions within the two-dimensional network of non-intersecting channels.

  11. Parallel Task Management Library for MARTe

    NASA Astrophysics Data System (ADS)

    Valcarcel, Daniel F.; Alves, Diogo; Neto, Andre; Reux, Cedric; Carvalho, Bernardo B.; Felton, Robert; Lomas, Peter J.; Sousa, Jorge; Zabeo, Luca

    2014-06-01

    The Multithreaded Application Real-Time executor (MARTe) is a real-time framework with increasing popularity and support in the thermonuclear fusion community. It allows modular code to run in a multi-threaded environment leveraging on the current multi-core processor (CPU) technology. One application that relies on the MARTe framework is the Joint European Torus (JET) tokamak WAll Load Limiter System (WALLS). It calculates and monitors the temperature on metal tiles and plasma facing components (PFCs) that can melt or flake if their temperature gets too high when exposed to power loads. One of the main time consuming tasks in WALLS is the calculation of thermal diffusion models in real-time. These models tend to be described by very large state-space models thus making them perfect candidates for parallelisation. MARTe's traditional approach for task parallelisation is to split the problem into several Real-Time Threads, each responsible for a self-contained sequential execution of an input-to-output chain. This is usually possible, but it might not always be practical for algorithmic or technical reasons. Also, it might not be easily scalable with an increase in the number of available CPU cores. The WorkLibrary introduces a “GPU-like approach” of splitting work among the available cores of modern CPUs that is (i) straightforward to use in an application, (ii) scalable with the availability of cores and all of this (iii) without rewriting or recompiling the source code. The first part of this article explains the motivation behind the library, its architecture and implementation. The second part presents a real application for WALLS, a parallel version of a large state-space model describing the 2D thermal diffusion on a JET tile.

  12. Vapor-liquid equilibrium and critical asymmetry of square well and short square well chain fluids.

    PubMed

    Li, Liyan; Sun, Fangfang; Chen, Zhitong; Wang, Long; Cai, Jun

    2014-08-07

    The critical behavior of square well fluids with variable interaction ranges and of short square well chain fluids have been investigated by grand canonical ensemble Monte Carlo simulations. The critical temperatures and densities were estimated by a finite-size scaling analysis with the help of histogram reweighting technique. The vapor-liquid coexistence curve in the near-critical region was determined using hyper-parallel tempering Monte Carlo simulations. The simulation results for coexistence diameters show that the contribution of |t|(1-α) to the coexistence diameter dominates the singular behavior in all systems investigated. The contribution of |t|(2β) to the coexistence diameter is larger for the system with a smaller interaction range λ. While for short square well chain fluids, longer the chain length, larger the contribution of |t|(2β). The molecular configuration greatly influences the critical asymmetry: a short soft chain fluid shows weaker critical asymmetry than a stiff chain fluid with same chain length.

  13. Immersion Suit Flotation Testing REACT Report

    DTIC Science & Technology

    2016-08-01

    wind-generated motion, we used a 75-pound, pyramid anchor , with 20 feet of 3/8 inch mooring chain. As with the ballasted mannequin, the team fully...everything, packed it and shipped it to JMTF Mobile where the team would reassemble the gear for in-water deployment. This included both 75-pound anchors ...first mooring on the ramp, put the tethered buoy in the water, then put the anchor over the side, allowing the chain to run free. Next, the team

  14. Scalable Unix tools on parallel processors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gropp, W.; Lusk, E.

    1994-12-31

    The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.

  15. A Queue Simulation Tool for a High Performance Scientific Computing Center

    NASA Technical Reports Server (NTRS)

    Spear, Carrie; McGalliard, James

    2007-01-01

    The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.

  16. Parallel File System I/O Performance Testing On LANL Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wiens, Isaac Christian; Green, Jennifer Kathleen

    2016-08-18

    These are slides from a presentation on parallel file system I/O performance testing on LANL clusters. I/O is a known bottleneck for HPC applications. Performance optimization of I/O is often required. This summer project entailed integrating IOR under Pavilion and automating the results analysis. The slides cover the following topics: scope of the work, tools utilized, IOR-Pavilion test workflow, build script, IOR parameters, how parameters are passed to IOR, *run_ior: functionality, Python IOR-Output Parser, Splunk data format, Splunk dashboard and features, and future work.

  17. Parallel Climate Data Assimilation PSAS Package

    NASA Technical Reports Server (NTRS)

    Ding, Hong Q.; Chan, Clara; Gennery, Donald B.; Ferraro, Robert D.

    1996-01-01

    We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512node Intel Paragon. The equation solver achieves a sustained 18 Gflops performance. As the results, we achieved an unprecedented 100-fold solution time reduction on the Intel Paragon parallel platform over the Cray C90. This not only meets and exceeds the DAO time requirements, but also significantly enlarges the window of exploration in climate data assimilations.

  18. Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains.

    PubMed

    Jha, Ashwani; Flurchick, K M; Bikdash, Marwan; Kc, Dukka B

    2016-01-01

    Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10-15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.

  19. Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains

    PubMed Central

    Jha, Ashwani; Flurchick, K. M.; Bikdash, Marwan

    2016-01-01

    Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors. PMID:27747230

  20. Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate

    NASA Technical Reports Server (NTRS)

    Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel

    1994-01-01

    This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.

  1. Broadband hybrid electromagnetic and piezoelectric energy harvesting from ambient vibrations and pneumatic vortices induced by running subway trains.

    DOT National Transportation Integrated Search

    2017-05-01

    The airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and : trajectory matching multi-pole magnets was investigated. The magnets were aligned in an alternatively : magnetized formation of 6 magnets to...

  2. 3. View looking S down West Broad Street sidewalk showing ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. View looking S down West Broad Street sidewalk showing S half of Gate in foreground, Wickersham fence running parallel to West Broad St. and Passenger Station in background. - Central of Georgia Railway, Cotton Yard Gates, West Broad Street, Savannah, Chatham County, GA

  3. Enhancing the usability and performance of structured association mapping algorithms using automation, parallelization, and visualization in the GenAMap software system

    PubMed Central

    2012-01-01

    Background Structured association mapping is proving to be a powerful strategy to find genetic polymorphisms associated with disease. However, these algorithms are often distributed as command line implementations that require expertise and effort to customize and put into practice. Because of the difficulty required to use these cutting-edge techniques, geneticists often revert to simpler, less powerful methods. Results To make structured association mapping more accessible to geneticists, we have developed an automatic processing system called Auto-SAM. Auto-SAM enables geneticists to run structured association mapping algorithms automatically, using parallelization. Auto-SAM includes algorithms to discover gene-networks and find population structure. Auto-SAM can also run popular association mapping algorithms, in addition to five structured association mapping algorithms. Conclusions Auto-SAM is available through GenAMap, a front-end desktop visualization tool. GenAMap and Auto-SAM are implemented in JAVA; binaries for GenAMap can be downloaded from http://sailing.cs.cmu.edu/genamap. PMID:22471660

  4. Durham extremely large telescope adaptive optics simulation platform.

    PubMed

    Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard

    2007-03-01

    Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.

  5. Integrated bioassays in microfluidic devices: botulinum toxin assays.

    PubMed

    Mangru, Shakuntala; Bentz, Bryan L; Davis, Timothy J; Desai, Nitin; Stabile, Paul J; Schmidt, James J; Millard, Charles B; Bavari, Sina; Kodukula, Krishna

    2005-12-01

    A microfluidic assay was developed for screening botulinum neurotoxin serotype A (BoNT-A) by using a fluorescent resonance energy transfer (FRET) assay. Molded silicone microdevices with integral valves, pumps, and reagent reservoirs were designed and fabricated. Electrical and pneumatic control hardware were constructed, and software was written to automate the assay protocol and data acquisition. Detection was accomplished by fluorescence microscopy. The system was validated with a peptide inhibitor, running 2 parallel assays, as a feasibility demonstration. The small footprint of each bioreactor cell (0.5 cm2) and scalable fluidic architecture enabled many parallel assays on a single chip. The chip is programmable to run a dilution series in each lane, generating concentration-response data for multiple inhibitors. The assay results showed good agreement with the corresponding experiments done at a macroscale level. Although the system has been developed for BoNT-A screening, a wide variety of assays can be performed on the microfluidic chip with little or no modification.

  6. The anatomy and biomechanics of running.

    PubMed

    Nicola, Terry L; Jewison, David J

    2012-04-01

    To understand the normal series of biomechanical events of running, a comparative assessment to walking is helpful. Closed kinetic chain through the lower extremities, control of the lumbopelvic mechanism, and overall symmetry of movement has been described well enough that deviations from normal movement can now be associated with specific overuse injuries experienced by runners. This information in combination with a history of the runner's errors in their training program will lead to a more comprehensive treatment and prevention plan for related injuries.

  7. The rid-redundant procedure in C-Prolog

    NASA Technical Reports Server (NTRS)

    Chen, Huo-Yan; Wah, Benjamin W.

    1987-01-01

    C-Prolog can conveniently be used for logical inferences on knowledge bases. However, as similar to many search methods using backward chaining, a large number of redundant computation may be produced in recursive calls. To overcome this problem, the 'rid-redundant' procedure was designed to rid all redundant computations in running multi-recursive procedures. Experimental results obtained for C-Prolog on the Vax 11/780 computer show that there is an order of magnitude improvement in the running time and solvable problem size.

  8. Just in Time - Expecting Failure: Do JIT Principles Run Counter to DoD’s Business Nature?

    DTIC Science & Technology

    2014-04-01

    Regiment. The last several years witnessed both commercial industry and the Department of Defense (DoD) logistics supply chains trending to-ward an...moving items through a production system only when needed. Equating inventory to an avoidable waste instead of adding value to a company directly...Louisiana plant for a week, Honda Motor Company to suspend orders for Japanese-built Honda and Acura models, and pro- ducers of Boeing’s 787 to run billions

  9. Free energy landscapes of short peptide chains using adaptively biased molecular dynamics

    NASA Astrophysics Data System (ADS)

    Karpusenka, Vadzim; Babin, Volodymyr; Roland, Christopher; Sagui, Celeste

    2009-03-01

    We present the results of a computational study of the free energy landscapes of short polypeptide chains, as a function of several reaction coordinates meant to distinguish between several known types of helices. The free energy landscapes were calculated using the recently developed adaptively biased molecular dynamics method followed up with equilibrium ``umbrella correction'' runs. Specific polypeptides investigated include small chains of pure and mixed alanine, glutamate, leucine, lysine and methionine (all amino acids with strong helix-forming propensities), as well as glycine, proline(having a low helix forming propensities), tyrosine, serine and arginine. Our results are consistent with the existing experimental and other theoretical evidence.

  10. Parallelization and checkpointing of GPU applications through program transformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solano-Quinde, Lizandro Damian

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less

  11. Use of parallel computing for analyzing big data in EEG studies of ambiguous perception

    NASA Astrophysics Data System (ADS)

    Maksimenko, Vladimir A.; Grubov, Vadim V.; Kirsanov, Daniil V.

    2018-02-01

    Problem of interaction between human and machine systems through the neuro-interfaces (or brain-computer interfaces) is an urgent task which requires analysis of large amount of neurophysiological EEG data. In present paper we consider the methods of parallel computing as one of the most powerful tools for processing experimental data in real-time with respect to multichannel structure of EEG. In this context we demonstrate the application of parallel computing for the estimation of the spectral properties of multichannel EEG signals, associated with the visual perception. Using CUDA C library we run wavelet-based algorithm on GPUs and show possibility for detection of specific patterns in multichannel set of EEG data in real-time.

  12. MHD Code Optimizations and Jets in Dense Gaseous Halos

    NASA Astrophysics Data System (ADS)

    Gaibler, Volker; Vigelius, Matthias; Krause, Martin; Camenzind, Max

    We have further optimized and extended the 3D-MHD-code NIRVANA. The magnetized part runs in parallel, reaching 19 Gflops per SX-6 node, and has a passively advected particle population. In addition, the code is MPI-parallel now - on top of the shared memory parallelization. On a 512^3 grid, we reach 561 Gflops with 32 nodes on the SX-8. Also, we have successfully used FLASH on the Opteron cluster. Scientific results are preliminary so far. We report one computation of highly resolved cocoon turbulence. While we find some similarities to earlier 2D work by us and others, we note a strange reluctancy of cold material to enter the low density cocoon, which has to be investigated further.

  13. A tool for simulating parallel branch-and-bound methods

    NASA Astrophysics Data System (ADS)

    Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

    2016-01-01

    The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.

  14. Dynamic file-access characteristics of a production parallel scientific workload

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1994-01-01

    Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.

  15. A parallel computational model for GATE simulations.

    PubMed

    Rannou, F R; Vega-Acevedo, N; El Bitar, Z

    2013-12-01

    GATE/Geant4 Monte Carlo simulations are computationally demanding applications, requiring thousands of processor hours to produce realistic results. The classical strategy of distributing the simulation of individual events does not apply efficiently for Positron Emission Tomography (PET) experiments, because it requires a centralized coincidence processing and large communication overheads. We propose a parallel computational model for GATE that handles event generation and coincidence processing in a simple and efficient way by decentralizing event generation and processing but maintaining a centralized event and time coordinator. The model is implemented with the inclusion of a new set of factory classes that can run the same executable in sequential or parallel mode. A Mann-Whitney test shows that the output produced by this parallel model in terms of number of tallies is equivalent (but not equal) to its sequential counterpart. Computational performance evaluation shows that the software is scalable and well balanced. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  16. Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

    PubMed

    Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

    2016-10-01

    Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.

  17. Evaluation of a parallel implementation of the learning portion of the backward error propagation neural network: experiments in artifact identification.

    PubMed Central

    Sittig, D. F.; Orr, J. A.

    1991-01-01

    Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607

  18. Second Evaluation of Job Queuing/Scheduling Software. Phase 1

    NASA Technical Reports Server (NTRS)

    Jones, James Patton; Brickell, Cristy; Chancellor, Marisa (Technical Monitor)

    1997-01-01

    The recent proliferation of high performance workstations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, NAS compiled a requirements checklist for job queuing/scheduling software. Next, NAS evaluated the leading job management system (JMS) software packages against the checklist. A year has now elapsed since the first comparison was published, and NAS has repeated the evaluation. This report describes this second evaluation, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still lacking, however, definite progress has been made by the vendors to correct the deficiencies. This report is supplemented by a WWW interface to the data collected, to aid other sites in extracting the evaluation information on specific requirements of interest.

  19. Symplectic molecular dynamics simulations on specially designed parallel computers.

    PubMed

    Borstnik, Urban; Janezic, Dusanka

    2005-01-01

    We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.

  20. 3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

    PubMed Central

    Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

    2014-01-01

    Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397

  1. Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

    NASA Technical Reports Server (NTRS)

    Ajmani, Kumud; Taylor, Arthur C., III

    1994-01-01

    This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.

  2. Supply chain dynamics in healthcare services.

    PubMed

    Samuel, Cherian; Gonapa, Kasiviswanadh; Chaudhary, P K; Mishra, Ananya

    2010-01-01

    The purpose of this paper is to analyse health service supply chain systems. A great deal of literature is available on supply chain management in finished goods inventory situations; however, little research exists on managing service capacity when finished goods inventories are absent. System dynamics models for a typical service-oriented supply chain such as healthcare processes are developed, wherein three service stages are presented sequentially. Just like supply chains with finished goods inventory, healthcare service supply chains also show dynamic behaviour. Comparing options, service reduction, and capacity adjustment delays showed that reducing capacity adjustment and service delays gives better results. The study is confined to health service-oriented supply chains. Further work includes extending the study to service-oriented supply chains with parallel processing, i.e. having more than one stage to perform a similar operation and also to study the behaviour in service-oriented supply chains that have re-entrant orders and applications. Specific case studies can also be developed to reveal factors relevant to particular service-oriented supply chains. The paper explains the bullwhip effect in healthcare service-oriented supply chains. Reducing stages and capacity adjustment are strategic options for service-oriented supply chains. The paper throws light on policy options for managing healthcare service-oriented supply chain dynamics.

  3. catena-Poly[[[4,6-bis-(2-pyrid-yl)-1,3,5-triazin-2-olato]copper(II)]-μ-chlorido].

    PubMed

    Cao, Man-Li

    2011-06-01

    The title compound, [Cu(C(13)H(8)N(5)O)Cl](n), has a chain structure parallel to [100] with Cu(2+) cations in a trigonal-bipyramidal coordination environment. The ligand adopts a tridentate tripyridyl coordination mode and a chloride ion acts as a bridge. The chains are linked via weak C-H⋯O and C-H⋯Cl hydrogen bonds into a three-dimensional supra-molecular network.

  4. Anisotropy effects in the ferromagnetic quantum chain systems (C 6H 11NH 3)CuCl 3 (CHAC) and (C 6H 11NH 3)CuBr 3 (CHAB)

    NASA Astrophysics Data System (ADS)

    Kopinga, K.; Nishihara, H.; De Jonge, W. J. M.

    1983-02-01

    Heat capacity and magnetization measurements on the title compounds revealed that they are very good approximations of a ferromagnetic S = {1}/{2} Heisenberg chain system. The small anisotropy present in these compounds gives rise to very pronounced cross-over effects. In CHAC, the cross-over temperature is increased by a magnetic field parallel to the easy axis.

  5. Scaling up antiretroviral therapy in Uganda: using supply chain management to appraise health systems strengthening

    PubMed Central

    2011-01-01

    Background Strengthened national health systems are necessary for effective and sustained expansion of antiretroviral therapy (ART). ART and its supply chain management in Uganda are largely based on parallel and externally supported efforts. The question arises whether systems are being strengthened to sustain access to ART. This study applies systems thinking to assess supply chain management, the role of external support and whether investments create the needed synergies to strengthen health systems. Methods This study uses the WHO health systems framework and examines the issues of governance, financing, information, human resources and service delivery in relation to supply chain management of medicines and the technologies. It looks at links and causal chains between supply chain management for ART and the national supply system for essential drugs. It combines data from the literature and key informant interviews with observations at health service delivery level in a study district. Results Current drug supply chain management in Uganda is characterized by parallel processes and information systems that result in poor quality and inefficiencies. Less than expected health system performance, stock outs and other shortages affect ART and primary care in general. Poor performance of supply chain management is amplified by weak conditions at all levels of the health system, including the areas of financing, governance, human resources and information. Governance issues include the lack to follow up initial policy intentions and a focus on narrow, short-term approaches. Conclusion The opportunity and need to use ART investments for an essential supply chain management and strengthened health system has not been exploited. By applying a systems perspective this work indicates the seriousness of missing system prerequisites. The findings suggest that root causes and capacities across the system have to be addressed synergistically to enable systems that can match and accommodate investments in disease-specific interventions. The multiplicity and complexity of existing challenges require a long-term and systems perspective essentially in contrast to the current short term and program-specific nature of external assistance. PMID:21806826

  6. Chaining direct memory access data transfer operations for compute nodes in a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.

    2010-09-28

    Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a parallel computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating, by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet.

  7. Macromolecular ab initio phasing enforcing secondary and tertiary structure.

    PubMed

    Millán, Claudia; Sammito, Massimo; Usón, Isabel

    2015-01-01

    Ab initio phasing of macromolecular structures, from the native intensities alone with no experimental phase information or previous particular structural knowledge, has been the object of a long quest, limited by two main barriers: structure size and resolution of the data. Current approaches to extend the scope of ab initio phasing include use of the Patterson function, density modification and data extrapolation. The authors' approach relies on the combination of locating model fragments such as polyalanine α-helices with the program PHASER and density modification with the program SHELXE. Given the difficulties in discriminating correct small substructures, many putative groups of fragments have to be tested in parallel; thus calculations are performed in a grid or supercomputer. The method has been named after the Italian painter Arcimboldo, who used to compose portraits out of fruit and vegetables. With ARCIMBOLDO, most collections of fragments remain a 'still-life', but some are correct enough for density modification and main-chain tracing to reveal the protein's true portrait. Beyond α-helices, other fragments can be exploited in an analogous way: libraries of helices with modelled side chains, β-strands, predictable fragments such as DNA-binding folds or fragments selected from distant homologues up to libraries of small local folds that are used to enforce nonspecific tertiary structure; thus restoring the ab initio nature of the method. Using these methods, a number of unknown macromolecules with a few thousand atoms and resolutions around 2 Å have been solved. In the 2014 release, use of the program has been simplified. The software mediates the use of massive computing to automate the grid access required in difficult cases but may also run on a single multicore workstation (http://chango.ibmb.csic.es/ARCIMBOLDO_LITE) to solve straightforward cases.

  8. Xyce parallel electronic simulator users guide, version 6.1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  9. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  10. Xyce parallel electronic simulator users guide, version 6.0.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  11. A compositional reservoir simulator on distributed memory parallel computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rame, M.; Delshad, M.

    1995-12-31

    This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less

  12. Automatic target recognition apparatus and method

    DOEpatents

    Baumgart, Chris W.; Ciarcia, Christopher A.

    2000-01-01

    An automatic target recognition apparatus (10) is provided, having a video camera/digitizer (12) for producing a digitized image signal (20) representing an image containing therein objects which objects are to be recognized if they meet predefined criteria. The digitized image signal (20) is processed within a video analysis subroutine (22) residing in a computer (14) in a plurality of parallel analysis chains such that the objects are presumed to be lighter in shading than the background in the image in three of the chains and further such that the objects are presumed to be darker than the background in the other three chains. In two of the chains the objects are defined by surface texture analysis using texture filter operations. In another two of the chains the objects are defined by background subtraction operations. In yet another two of the chains the objects are defined by edge enhancement processes. In each of the analysis chains a calculation operation independently determines an error factor relating to the probability that the objects are of the type which should be recognized, and a probability calculation operation combines the results of the analysis chains.

  13. Recent advances in PDF modeling of turbulent reacting flows

    NASA Technical Reports Server (NTRS)

    Leonard, Andrew D.; Dai, F.

    1995-01-01

    This viewgraph presentation concludes that a Monte Carlo probability density function (PDF) solution successfully couples with an existing finite volume code; PDF solution method applied to turbulent reacting flows shows good agreement with data; and PDF methods must be run on parallel machines for practical use.

  14. Parallel noise barrier prediction procedure : report 2 user's manual revision 1

    DOT National Transportation Integrated Search

    1987-11-01

    This report defines the parameters which are used to input the data required to run Program Barrier and BarrierX on a microcomputer such as an IBM PC or compatible. Directions for setting up and operating a working disk are presented. Examples of inp...

  15. Forces and mechanical energy fluctuations during diagonal stride roller skiing; running on wheels?

    PubMed

    Kehler, Alyse L; Hajkova, Eliska; Holmberg, Hans-Christer; Kram, Rodger

    2014-11-01

    Mechanical energy can be conserved during terrestrial locomotion in two ways: the inverted pendulum mechanism for walking and the spring-mass mechanism for running. Here, we investigated whether diagonal stride cross-country roller skiing (DIA) utilizes similar mechanisms. Based on previous studies, we hypothesized that running and DIA would share similar phase relationships and magnitudes of kinetic energy (KE), and gravitational potential energy (GPE) fluctuations, indicating elastic energy storage and return, as if roller skiing is like 'running on wheels'. Experienced skiers (N=9) walked and ran at 1.25 and 3 m s(-1), respectively, and roller skied with DIA at both speeds on a level dual-belt treadmill that recorded perpendicular and parallel forces. We calculated the KE and GPE of the center of mass from the force recordings. As expected, the KE and GPE fluctuated with an out-of-phase pattern during walking and an in-phase pattern during running. Unlike walking, during DIA, the KE and GPE fluctuations were in phase, as they are in running. However, during the glide phase, KE was dissipated as frictional heat and could not be stored elastically in the tendons, as in running. Elastic energy storage and return epitomize running and thus we reject our hypothesis. Diagonal stride cross-country skiing is a biomechanically unique movement that only superficially resembles walking or running. © 2014. Published by The Company of Biologists Ltd.

  16. cellGPU: Massively parallel simulations of dynamic vertex models

    NASA Astrophysics Data System (ADS)

    Sussman, Daniel M.

    2017-10-01

    Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation

  17. Learning from Physical Analogies: A Study in Analogy and the Explanation Process

    DTIC Science & Technology

    1988-12-27

    support of the various transfer operations, the forward chaining ATRE rule system is paired with an abductive retriever. This is a backward chaining...pO) is believed.3 When a new datum is entered in the database, ATRE exhaustively runs all rules made executable by the datum’s presence in a forward ...ZR) INFLUENESST1 (CTA (AIOUN-O ?V-2)) NERTON V- (ASUM (DISSOLVE-RAT SELF))) ER ) (DSETT (SKSOLUTIO-S -12)COOL (NOTUM (LSSOU-THN-1 A CONTRATO V-P) ER

  18. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Kwan-Liu

    Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations willmore » prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale computer. This project concludes with a functional prototype containing pervasively parallel algorithms that perform demonstratively well on many-core processors. These algorithms are fundamental for performing data analysis and visualization at extreme scale.« less

  19. Controlling the Morphology of Side Chain Liquid Crystalline Block Copolymer Thin Films through Variations in Liquid Crystalline Content

    PubMed Central

    Verploegen, Eric; Zhang, Tejia; Jung, Yeon Sik; Ross, Caroline; Hammond, Paula T.

    2009-01-01

    In this paper we describe methods for manipulating the morphology of side-chain liquid crystalline block copolymers through variations in the liquid crystalline content. By systematically controlling the covalent attachment of side chain liquid crystals to a block copolymer (BCP) backbone, the morphology of both the liquid crystalline (LC) mesophase and the phase segregated BCP microstructures can be precisely manipulated. Increases in LC functionalization lead to stronger preferences for the anchoring of the LC mesophase relative to the substrate and the inter-material dividing surface (IMDS). By manipulating the strength of these interactions the arrangement and ordering of the ultrathin film block copolymer nanostructures can be controlled, yielding a range of morphologies that includes perpendicular and parallel cylinders, as well as both perpendicular and parallel lamellae. Additionally, we demonstrate the utilization of selective etching to create a nanoporous liquid crystalline polymer thin film. The unique control over the orientation and order of the self-assembled morphologies with respect to the substrate will allow for the custom design of thin films for specific nano-patterning applications without manipulation of the surface chemistry or the application of external fields. PMID:18763835

  20. Synchronized Molecular-Dynamics simulation for thermal lubrication of a polymeric liquid between parallel plates

    NASA Astrophysics Data System (ADS)

    Yasuda, Shugo; Yamamoto, Ryoichi

    2015-11-01

    The Synchronized Molecular-Dynamics simulation which was recently proposed by authors is applied to the analysis of polymer lubrication between parallel plates. In the SMD method, the MD simulations are assigned to small fluid elements to calculate the local stresses and temperatures and are synchronized at certain time intervals to satisfy the macroscopic heat- and momentum-transport equations.The rheological properties and conformation of the polymer chains coupled with local viscous heating are investigated with a non-dimensional parameter, the Nahme-Griffith number, which is defined as the ratio of the viscous heating to the thermal conduction at the characteristic temperature required to sufficiently change the viscosity. The present simulation demonstrates that strong shear thinning and a transitional behavior of the conformation of the polymer chains are exhibited with a rapid temperature rise when the Nahme-Griffith number exceeds unity.The results also clarify that the reentrant transition of the linear stress-optical relation occurs for large shear stresses due to the coupling of the conformation of polymer chains with heat generation under shear flows. This study was financially supported by JSPS KAKENHI Grant Nos. 26790080 and 26247069.

  1. Controlling the morphology of side chain liquid crystalline block copolymer thin films through variations in liquid crystalline content.

    PubMed

    Verploegen, Eric; Zhang, Tejia; Jung, Yeon Sik; Ross, Caroline; Hammond, Paula T

    2008-10-01

    In this paper, we describe methods for manipulating the morphology of side-chain liquid crystalline block copolymers through variations in the liquid crystalline content. By systematically controlling the covalent attachment of side chain liquid crystals to a block copolymer (BCP) backbone, the morphology of both the liquid crystalline (LC) mesophase and the phase-segregated BCP microstructures can be precisely manipulated. Increases in LC functionalization lead to stronger preferences for the anchoring of the LC mesophase relative to the substrate and the intermaterial dividing surface. By manipulating the strength of these interactions, the arrangement and ordering of the ultrathin film block copolymer nanostructures can be controlled, yielding a range of morphologies that includes perpendicular and parallel cylinders, as well as both perpendicular and parallel lamellae. Additionally, we demonstrate the utilization of selective etching to create a nanoporous liquid crystalline polymer thin film. The unique control over the orientation and order of the self-assembled morphologies with respect to the substrate will allow for the custom design of thin films for specific nanopatterning applications without manipulation of the surface chemistry or the application of external fields.

  2. Dietary Echium Oil Increases Long-Chain n–3 PUFAs, Including Docosapentaenoic Acid, in Blood Fractions and Alters Biochemical Markers for Cardiovascular Disease Independently of Age, Sex, and Metabolic Syndrome12

    PubMed Central

    Kuhnt, Katrin; Fuhrmann, Claudia; Köhler, Melanie; Kiehntopf, Michael; Jahreis, Gerhard

    2014-01-01

    Dietary supplementation with echium oil (EO) containing stearidonic acid (SDA) is a plant-based strategy to improve long-chain (LC) n–3 (ω-3) polyunsaturated fatty acid (PUFA) status in humans. We investigated the effect of EO on LC n–3 PUFA accumulation in blood and biochemical markers with respect to age, sex, and metabolic syndrome. This double-blind, parallel-arm, randomized controlled study started with a 2-wk run-in period, during which participants (n = 80) were administered 17 g/d run-in oil. Normal-weight individuals from 2 age groups (20–35 and 49–69 y) were allotted to EO or fish oil (FO; control) groups. During the 8-wk intervention, participants were administered either 17 g/d EO (2 g SDA; n = 59) or FO [1.9 g eicosapentaenoic acid (EPA); n = 19]. Overweight individuals with metabolic syndrome (n = 19) were recruited for EO treatment only. During the 10-wk study, the participants followed a dietary n–3 PUFA restriction, e.g., no fish. After the 8-wk EO treatment, increases in the LC n–3 metabolites EPA (168% and 79%) and docosapentaenoic acid [DPA (68% and 39%)] were observed, whereas docosahexaenoic acid (DHA) decreased (−5% and −23%) in plasma and peripheral blood mononuclear cells, respectively. Compared with FO, the efficacy of EO to increase EPA and DPA in blood was significantly lower (∼25% and ∼50%, respectively). A higher body mass index (BMI) was associated with lower relative and net increases in EPA and DPA. Compared with baseline, EO significantly reduced serum cholesterol, LDL cholesterol, oxidized LDL, and triglyceride (TG), but also HDL cholesterol, regardless of age and BMI. In the FO group, only TG decreased. Overall, daily intake of 15–20 g EO increased EPA and DPA in blood but had no influence on DHA. EO lowered cardiovascular risk markers, e.g., serum TG, which is particularly relevant for individuals with metabolic syndrome. Natural EO could be a noteworthy source of n–3 PUFA in human nutrition. This trial was registered at clinicaltrials.gov as NCT01856179. PMID:24553695

  3. Parallel family trees for transfer matrices in the Potts model

    NASA Astrophysics Data System (ADS)

    Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

    2015-02-01

    The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster scenario it was in the range p ∈ [ 8 , 10 ] . Because of the parallel capabilities of the algorithm, a large-scale execution of the parallel family trees strategy in a supercomputer could contribute to the study of wider strip lattices.

  4. Negativity as the entanglement measure to probe the Kondo regime in the spin-chain Kondo model

    NASA Astrophysics Data System (ADS)

    Bayat, Abolfazl; Sodano, Pasquale; Bose, Sougato

    2010-02-01

    We study the entanglement of an impurity at one end of a spin chain with a block of spins using negativity as a true measure of entanglement to characterize the unique features of the gapless Kondo regime in the spin-chain Kondo model. For this spin chain in the Kondo regime we determine—with a true entanglement measure—the spatial extent of the Kondo screening cloud, we propose an ansatz for its ground state and demonstrate that the impurity spin is indeed maximally entangled with the cloud. To better evidence the peculiarities of the Kondo regime, we carry a parallel analysis of the entanglement properties of the Kondo spin-chain model in the gapped dimerized regime. Our study shows how a genuine entanglement measure stemming from quantum information theory can fully characterize also nonperturbative regimes accessible to certain condensed matter systems.

  5. Individuals' knowledge and practices of the cold chain.

    PubMed

    Uçar, Aslı; Ozçelik, Ayşe Özfer

    2013-01-01

    This study aims to identify the influence of education on the practices and knowledge of consumers to protect or maintain the cold chain in the Turkish capital of Ankara. Data were gathered by using a questionnaire. Participants were 700 randomly selected volunteering adults. The majority of the participants had a university degree (69.0%) and did not know the definition of cold chain but had some knowledge about it, and differences existed between primary school and university graduates. The scores of consumers' attitudes to maintain cold chain were determined to increase in parallel with education level. The rate of people knowing refrigerator temperature, the coldest part of refrigerator, and controlling whether shops correctly store the products was highest in university graduates. Adults were observed to believe that shop assistants were responsible for maintaining a cold chain. However, the actual importance of consumers in this process reveals the importance of education for individuals.

  6. Sex-related differences in the wheel-running activity of mice decline with increasing age.

    PubMed

    Bartling, Babett; Al-Robaiy, Samiya; Lehnich, Holger; Binder, Leonore; Hiebl, Bernhard; Simm, Andreas

    2017-01-01

    Laboratory mice of both sexes having free access to running wheels are commonly used to study mechanisms underlying the beneficial effects of physical exercise on health and aging in human. However, comparative wheel-running activity profiles of male and female mice for a long period of time in which increasing age plays an additional role are unknown. Therefore, we permanently recorded the wheel-running activity (i.e., total distance, median velocity, time of breaks) of female and male mice until 9months of age. Our records indicated higher wheel-running distances for females than males which were highest in 2-month-old mice. This was mainly reached by higher running velocities of the females and not by longer running times. However, the sex-related differences declined in parallel to the age-associated reduction in wheel-running activities. Female mice also showed more variances between the weekly running distances than males, which were recorded most often for females being 4-6months old but not older. Additional records of 24-month-old mice of both sexes indicated highly reduced wheel-running activities at old age. Surprisingly, this reduction at old age resulted mainly from lower running velocities and not from shorter running times. Old mice also differed in their course of night activity which peaked later compared to younger mice. In summary, we demonstrated the influence of sex on the age-dependent activity profile of mice which is somewhat contrasting to humans, and this has to be considered when transferring exercise-mediated mechanism from mouse to human. Copyright © 2016. Published by Elsevier Inc.

  7. catena-Poly[[[4,6-bis­(2-pyrid­yl)-1,3,5-triazin-2-olato]copper(II)]-μ-chlorido

    PubMed Central

    Cao, Man-Li

    2011-01-01

    The title compound, [Cu(C13H8N5O)Cl]n, has a chain structure parallel to [100] with Cu2+ cations in a trigonal–bipyramidal coordination environment. The ligand adopts a tridentate tripyridyl coordination mode and a chloride ion acts as a bridge. The chains are linked via weak C—H⋯O and C—H⋯Cl hydrogen bonds into a three-dimensional supra­molecular network. PMID:21754632

  8. rfpipe: Radio interferometric transient search pipeline

    NASA Astrophysics Data System (ADS)

    Law, Casey J.

    2017-10-01

    rfpipe supports Python-based analysis of radio interferometric data (especially from the Very Large Array) and searches for fast radio transients. This extends on the rtpipe library (ascl:1706.002) with new approaches to parallelization, acceleration, and more portable data products. rfpipe can run in standalone mode or be in a cluster environment.

  9. A Configuration Framework and Implementation for the Least Privilege Separation Kernel

    DTIC Science & Technology

    2010-12-01

    The Altova Web site states that virtualization software, Parallels for Mac and Wine , is required for running it on MacOS and RedHat Linux...University of Singapore Singapore 28. Tan Lai Poh National University of Singapore Singapore 29. Quek Chee Luan Defence Science & Technology Agency Singapore

  10. Parallel Algorithm Solves Coupled Differential Equations

    NASA Technical Reports Server (NTRS)

    Hayashi, A.

    1987-01-01

    Numerical methods adapted to concurrent processing. Algorithm solves set of coupled partial differential equations by numerical integration. Adapted to run on hypercube computer, algorithm separates problem into smaller problems solved concurrently. Increase in computing speed with concurrent processing over that achievable with conventional sequential processing appreciable, especially for large problems.

  11. Internationalising Professional Skill Development: Are the Rich Getting Richer?

    ERIC Educational Resources Information Center

    Soontiens, Werner

    2004-01-01

    Internationalisation of education, and more specifically tertiary education, all over the world has contributed to a significant overhaul in student composition. Parallel to this runs the need for graduates to leave university with a range of professional skills. In response to this, universities actively encourage the development of such skills…

  12. Optimization of a pavement instrumentation plan for a full-scale test road : evaluation, [summary].

    DOT National Transportation Integrated Search

    2014-04-01

    The Florida Department of Transportation (FDOT) : has begun planning for a concrete test road : that will run parallel to US-301 for 2.5 miles in : Bradford County. Test road construction begins in : 2016. The roads 52 segments will enable real-wo...

  13. Ghost writer | ASCR Discovery

    Science.gov Websites

    the one illustrated here, the outer membrane protein OprF of Pseudomonas aeruginosa in its -1990s, NWChem was designed to run on networked processors, as in an HPC system, using one-sided communication, says Jeff Hammond of Intel Corp.'s Parallel Computing Laboratory. In one-sided communication, a

  14. Practical Application of Fundamental Concepts in Exercise Physiology

    ERIC Educational Resources Information Center

    Ramsbottom R.; Kinch, R. F. T.; Morris, M. G.; Dennis, A. M.

    2007-01-01

    The collection of primary data in laboratory classes enhances undergraduate practical and critical thinking skills. The present article describes the use of a lecture program, running in parallel with a series of linked practical classes, that emphasizes classical or standard concepts in exercise physiology. The academic and practical program ran…

  15. INTERIOR VIEW, PASSAGE AND DOOR LETTING ONTO THE SOUTHEAST BED ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    INTERIOR VIEW, PASSAGE AND DOOR LETTING ONTO THE SOUTHEAST BED CHAMBER. THE ANGLED PASSAGE RUNS PARALLEL TO WHAT WAS AN EXTERIOR WALL OF THE THREE-SIDED WINDOW BOW PRESENT IN THE HOUSE’S ORIGINAL CA. 1770 STATE - The Woodlands, 4000 Woodlands Avenue, Philadelphia, Philadelphia County, PA

  16. Sentinel-1 data massive processing for large scale DInSAR analyses within Cloud Computing environments through the P-SBAS approach

    NASA Astrophysics Data System (ADS)

    Lanari, Riccardo; Bonano, Manuela; Buonanno, Sabatino; Casu, Francesco; De Luca, Claudio; Fusco, Adele; Manunta, Michele; Manzo, Mariarosaria; Pepe, Antonio; Zinno, Ivana

    2017-04-01

    The SENTINEL-1 (S1) mission is designed to provide operational capability for continuous mapping of the Earth thanks to its two polar-orbiting satellites (SENTINEL-1A and B) performing C-band synthetic aperture radar (SAR) imaging. It is, indeed, characterized by enhanced revisit frequency, coverage and reliability for operational services and applications requiring long SAR data time series. Moreover, SENTINEL-1 is specifically oriented to interferometry applications with stringent requirements based on attitude and orbit accuracy and it is intrinsically characterized by small spatial and temporal baselines. Consequently, SENTINEL-1 data are particularly suitable to be exploited through advanced interferometric techniques such as the well-known DInSAR algorithm referred to as Small BAseline Subset (SBAS), which allows the generation of deformation time series and displacement velocity maps. In this work we present an advanced interferometric processing chain, based on the Parallel SBAS (P-SBAS) approach, for the massive processing of S1 Interferometric Wide Swath (IWS) data aimed at generating deformation time series in efficient, automatic and systematic way. Such a DInSAR chain is designed to exploit distributed computing infrastructures, and more specifically Cloud Computing environments, to properly deal with the storage and the processing of huge S1 datasets. In particular, since S1 IWS data are acquired with the innovative Terrain Observation with Progressive Scans (TOPS) mode, we could benefit from the structure of S1 data, which are composed by bursts that can be considered as separate acquisitions. Indeed, the processing is intrinsically parallelizable with respect to such independent input data and therefore we basically exploited this coarse granularity parallelization strategy in the majority of the steps of the SBAS processing chain. Moreover, we also implemented more sophisticated parallelization approaches, exploiting both multi-node and multi-core programming techniques. Currently, Cloud Computing environments make available large collections of computing resources and storage that can be effectively exploited through the presented S1 P-SBAS processing chain to carry out interferometric analyses at a very large scale, in reduced time. This allows us to deal also with the problems connected to the use of S1 P-SBAS chain in operational contexts, related to hazard monitoring and risk prevention and mitigation, where handling large amounts of data represents a challenging task. As a significant experimental result we performed a large spatial scale SBAS analysis relevant to the Central and Southern Italy by exploiting the Amazon Web Services Cloud Computing platform. In particular, we processed in parallel 300 S1 acquisitions covering the Italian peninsula from Lazio to Sicily through the presented S1 P-SBAS processing chain, generating 710 interferograms, thus finally obtaining the displacement time series of the whole processed area. This work has been partially supported by the CNR-DPC agreement, the H2020 EPOS-IP project (GA 676564) and the ESA GEP project.

  17. High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

    NASA Astrophysics Data System (ADS)

    Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

    2012-09-01

    By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.

  18. Liquid crystalline polymers in good nematic solvents: Free chains, mushrooms, and brushes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, D.R.M.; Halperin, A.

    1993-08-02

    The swelling of main chain liquid crystalline polymers (LCPs) in good nematic solvents is theoretically studied, focusing on brushes of terminally anchored, grafted LCPs. The analysis is concerned with long LCPs, of length L, with n[sub 0] >> 1 hairpin defects. The extension behavior of the major axis, R[parallel], of these ellipsoidal objects gives rise to an Ising elasticity with a free energy penalty of F[sub el](R[parallel])/kT [approx] n[sub 0] [minus] n[sub 0](1 [minus] R[parallel][sup 2]/L[sup 2])[sup 1/2]. The theory of the extension behavior enables the formulation of a Flory type theory of swelling of isolated LCPs yielding R[parallel] [approx]more » exp(2U[sub h]/5kT)N[sup 3/5] and R [perpendicular] [approx] exp([minus]U[sub h]/10kT)N[sup 3/5], with N the degree of polymerization and U[sub h] the hairpin energy. It also allows the generalization of the Alexander model for polymer brushes to the case of grafted LCPs. The behavior of LCP brushes depends on the alignment imposed by the grafting surface and the liquid crystalline solvent. A tilting phase transition is predicted as the grafting density is increased for a surface imposing homogeneous, parallel anchoring. A related transition is expected upon compression of a brush subject to homeotropic, perpendicular alignment. The effect of magnetic or electric fields on these phase transitions is also studied. The critical magnetic/electric field for the Frederiks transition can be lowered to arbitrarily small values by using surfaces coated by brushes of appropriate density.« less

  19. NLSEmagic: Nonlinear Schrödinger equation multi-dimensional Matlab-based GPU-accelerated integrators using compact high-order schemes

    NASA Astrophysics Data System (ADS)

    Caplan, R. M.

    2013-04-01

    We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schrödinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and are much cheaper to run than on standard parallel clusters. The codes are developed with usability and portability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with the MEX-compiler interface. The packages are freely distributed, including user manuals and set-up files. Catalogue identifier: AEOJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOJ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 124453 No. of bytes in distributed program, including test data, etc.: 4728604 Distribution format: tar.gz Programming language: C, CUDA, MATLAB. Computer: PC, MAC. Operating system: Windows, MacOS, Linux. Has the code been vectorized or parallelized?: Yes. Number of processors used: Single CPU, number of GPU processors dependent on chosen GPU card (max is currently 3072 cores on GeForce GTX 690). Supplementary material: Setup guide, Installation guide. RAM: Highly dependent on dimensionality and grid size. For typical medium-large problem size in three dimensions, 4GB is sufficient. Keywords: Nonlinear Schröodinger Equation, GPU, high-order finite difference, Bose-Einstien condensates. Classification: 4.3, 7.7. Nature of problem: Integrate solutions of the time-dependent one-, two-, and three-dimensional cubic nonlinear Schrödinger equation. Solution method: The integrators utilize a fully-explicit fourth-order Runge-Kutta scheme in time and both second- and fourth-order differencing in space. The integrators are written to run on NVIDIA GPUs and are interfaced with MATLAB including built-in visualization and analysis tools. Restrictions: The main restriction for the GPU integrators is the amount of RAM on the GPU as the code is currently only designed for running on a single GPU. Unusual features: Ability to visualize real-time simulations through the interaction of MATLAB and the compiled GPU integrators. Additional comments: Setup guide and Installation guide provided. Program has a dedicated web site at www.nlsemagic.com. Running time: A three-dimensional run with a grid dimension of 87×87×203 for 3360 time steps (100 non-dimensional time units) takes about one and a half minutes on a GeForce GTX 580 GPU card.

  20. Branson: A Mini-App for Studying Parallel IMC, Version 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Long, Alex

    This code solves the gray thermal radiative transfer (TRT) equations in parallel using simple opacities and Cartesian meshes. Although Branson solves the TRT equations it is not designed to model radiation transport: Branson contains simple physics and does not have a multigroup treatment, nor can it use physical material data. The opacities have are simple polynomials in temperature there is a limited ability to specify complex geometries and sources. Branson was designed only to capture the computational demands of production IMC codes, especially in large parallel runs. It was also intended to foster collaboration with vendors, universities and other DOEmore » partners. Branson is similar in character to the neutron transport proxy-app Quicksilver from LLNL, which was recently open-sourced.« less

Top